Finding Best Initial Configs using AIConfigurator — NVIDIA Dynamo Documentation

Last updated: 11/7/2025

Title: Finding Best Initial Configs using AIConfigurator#

URL Source: https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html

Published Time: Fri, 07 Nov 2025 17:51:56 GMT

Markdown Content: AIConfigurator is a performance optimization tool that helps you find the optimal configuration for deploying LLMs with Dynamo. It automatically determines the best number of prefill and decode workers, parallelism settings, and deployment parameters to meet your SLA targets while maximizing throughput.

Why Use AIConfigurator?#

When deploying LLMs with Dynamo, you need to make several critical decisions:

  • Aggregated vs Disaggregated: Which architecture gives better performance for your workload?

  • Worker Configuration: How many prefill and decode workers to deploy?

  • Parallelism Settings: What tensor/pipeline parallel configuration to use?

  • SLA Compliance: How to meet your TTFT and TPOT targets?

AIConfigurator answers these questions in seconds, providing:

  • Optimal configurations that meet your SLA requirements

  • Ready-to-deploy Dynamo configuration files

  • Performance comparisons between different deployment strategies

  • Up to 1.7x better throughput compared to manual configuration

Quick Start#

Install

pip3 install aiconfigurator

Find optimal configuration

aiconfigurator cli default
--model QWEN3_32B \ # Model name (QWEN3_32B, LLAMA3.1_70B, etc.) --total_gpus 32 \ # Number of available GPUs --system h200_sxm \ # GPU type (h100_sxm, h200_sxm, a100_sxm) --isl 4000 \ # Input sequence length (tokens) --osl 500 \ # Output sequence length (tokens) --ttft 300 \ # Target Time To First Token (ms) --tpot 10 \ # Target Time Per Output Token (ms) --save_dir ./dynamo-configs

Deploy

kubectl apply -f ./dynamo-configs/disagg/top1/disagg/k8s_deploy.yaml

Example Output#


  • Dynamo aiconfigurator Final Results *


Input Configuration & SLA Target: Model: QWEN3_32B (is_moe: False) Total GPUs: 32 Best Experiment Chosen: disagg at 812.92 tokens/s/gpu (1.70x better)

Overall Best Configuration: - Best Throughput: 812.92 tokens/s/gpu - User Throughput: 120.23 tokens/s/user - TTFT: 276.76ms - TPOT: 8.32ms

Pareto Frontier: QWEN3_32B Pareto Frontier: tokens/s/gpu vs tokens/s/user ┌────────────────────────────────────────────────────────────────────────┐ 1600.0┤ •• disagg │ │ ff agg │ │ xx disagg best │ │ │ 1333.3┤ f │ │ ff │ │ ff • │ │ f •••••••• │ 1066.7┤ f •• │ │ fff •••••••• │ │ f •• │ │ f •••• │ 800.0┤ fffff •••x │ │ fff •• │ │ fff • │ │ fffff •• │ 533.3┤ ffff •• │ │ ffff •• │ │ fffffff ••••• │ │ ffffff •• │ 266.7┤ fffff ••••••••• │ │ ffffffffff │ │ f │ │ │ 0.0┤ │ └┬─────────────────┬─────────────────┬────────────────┬─────────────────┬┘ 0 60 120 180 240 tokens/s/gpu tokens/s/user

  1. Performance Comparison: Shows disaggregated vs aggregated serving performance
  2. Optimal Configuration: The best configuration that meets your SLA targets
  3. Deployment Files: Ready-to-use Dynamo configuration files

Key Features

Fast Profiling Integration

# Use with Dynamo's SLA planner (20-30 seconds vs hours)
python3 -m benchmarks.profiler.profile_sla \
   --config ./components/backends/trtllm/deploy/disagg.yaml \
   --backend trtllm \
   --use-ai-configurator \
   --aic-system h200_sxm \
   --aic-model-name QWEN3_32B

### Custom Configuration[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#custom-configuration "Link to this heading")

# For advanced users: define custom search space
aiconfigurator cli exp --yaml_path custom_config.yaml

Common Use Cases[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#common-use-cases "Link to this heading")
----------------------------------------------------------------------------------------------------------------------------------

# Strict SLAs (low latency)
aiconfigurator cli default --model QWEN2.5_7B --total_gpus 8 --system h200_sxm --ttft 100 --tpot 5

# High throughput (relaxed latency)
aiconfigurator cli default --model QWEN3_32B --total_gpus 32 --system h200_sxm --ttft 1000 --tpot 50

Supported Configurations[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#supported-configurations "Link to this heading")
--------------------------------------------------------------------------------------------------------------------------------------------------

**Models**: GPT, LLAMA2/3, QWEN2.5/3, Mixtral, DEEPSEEK_V3 **GPUs**: H100, H200, A100, B200 (preview), GB200 (preview) **Backend**: TensorRT-LLM (vLLM and SGLang coming soon)

Additional Options[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#additional-options "Link to this heading")
--------------------------------------------------------------------------------------------------------------------------------------

# Web interface
aiconfigurator webapp # Visit http://127.0.0.1:7860

# Docker
docker run -it --rm nvcr.io/nvidia/aiconfigurator:latest \
 aiconfigurator cli default --model LLAMA3.1_70B --total_gpus 16 --system h100_sxm

Troubleshooting[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#troubleshooting "Link to this heading")
--------------------------------------------------------------------------------------------------------------------------------

**Model name mismatch**: Use exact model name that matches your deployment **GPU allocation**: Verify available GPUs match `--total_gpus`**Performance variance**: Results are estimates - benchmark actual deployment

Learn More[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#learn-more "Link to this heading")
----------------------------------------------------------------------------------------------------------------------

*   [Dynamo Installation Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/kubernetes/installation_guide.md)

*   [SLA Planner Quick Start Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/planner/sla_planner_quickstart.md)

*   [Benchmarking Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/benchmarks/benchmarking.md)

Links/Buttons:
- [Skip to main content](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#main-content)
- [NVIDIA Dynamo Documentation](https://docs.nvidia.com/dynamo/latest/index.html.md)
- [latest](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md)
- [0.6.1](https://docs.nvidia.com/dynamo/archive/0.6.1/performance/aiconfigurator.html)
- [0.6.0](https://docs.nvidia.com/dynamo/archive/0.6.0/performance/aiconfigurator.html.md)
- [0.5.1](https://docs.nvidia.com/dynamo/archive/0.5.1/performance/aiconfigurator.html)
- [0.5.0](https://docs.nvidia.com/dynamo/archive/0.5.0/performance/aiconfigurator.html)
- [0.4.1](https://docs.nvidia.com/dynamo/archive/0.4.1/performance/aiconfigurator.html)
- [0.4.0](https://docs.nvidia.com/dynamo/archive/0.4.0/performance/aiconfigurator.html)
- [0.3.2](https://docs.nvidia.com/dynamo/archive/0.3.2/performance/aiconfigurator.html)
- [0.3.1](https://docs.nvidia.com/dynamo/archive/0.3.1/performance/aiconfigurator.html)
- [0.3.0](https://docs.nvidia.com/dynamo/archive/0.3.0/performance/aiconfigurator.html)
- [0.2.1](https://docs.nvidia.com/dynamo/archive/0.2.1/performance/aiconfigurator.html)
- [0.2.0](https://docs.nvidia.com/dynamo/archive/0.2.0/performance/aiconfigurator.html)
- [GitHub](https://github.com/ai-dynamo/dynamo)
- [Installation](https://docs.nvidia.com/dynamo/latest/_sections/installation.html.md)
- [Support Matrix](https://docs.nvidia.com/dynamo/latest/reference/support-matrix.html.md)
- [Examples](https://docs.nvidia.com/dynamo/latest/_sections/examples.html.md)
- [Deployment Guide](https://docs.nvidia.com/dynamo/latest/_sections/k8s_deployment.html.md)
- [Kubernetes Quickstart](https://docs.nvidia.com/dynamo/latest/kubernetes/README.html.md)
- [Detailed Installation Guide](https://docs.nvidia.com/dynamo/latest/kubernetes/installation_guide.html.md)
- [Dynamo Operator](https://docs.nvidia.com/dynamo/latest/kubernetes/dynamo_operator.html.md)
- [Minikube Setup](https://docs.nvidia.com/dynamo/latest/kubernetes/deployment/minikube.html.md)
- [Observability (K8s)](https://docs.nvidia.com/dynamo/latest/_sections/k8s_observability.html.md)
- [Metrics](https://docs.nvidia.com/dynamo/latest/observability/metrics.html.md)
- [Logging](https://docs.nvidia.com/dynamo/latest/observability/logging.html.md)
- [Multinode](https://docs.nvidia.com/dynamo/latest/_sections/k8s_multinode.html.md)
- [Multinode Deployments](https://docs.nvidia.com/dynamo/latest/kubernetes/deployment/multinode-deployment.html.md)
- [Grove](https://docs.nvidia.com/dynamo/latest/kubernetes/grove.html.md)
- [Tool Calling](https://docs.nvidia.com/dynamo/latest/agents/tool-calling.html.md)
- [Multimodality Support](https://docs.nvidia.com/dynamo/latest/multimodal/multimodal_intro.html.md)
- [Finding Best Initial Configs](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#)
- [Dynamo Benchmarking Guide](https://docs.nvidia.com/dynamo/latest/benchmarks/benchmarking.html.md)
- [Tuning Disaggregated Performance](https://docs.nvidia.com/dynamo/latest/performance/tuning.html.md)
- [Writing Python Workers in Dynamo](https://docs.nvidia.com/dynamo/latest/development/backend-guide.html.md)
- [Observability (Local)](https://docs.nvidia.com/dynamo/latest/_sections/observability.html.md)
- [Metrics Visualization with Prometheus and Grafana](https://docs.nvidia.com/dynamo/latest/observability/prometheus-grafana.html)
- [Health Checks](https://docs.nvidia.com/dynamo/latest/observability/health-checks.html.md)
- [Glossary](https://docs.nvidia.com/dynamo/latest/reference/glossary.html.md)
- [Backends](https://docs.nvidia.com/dynamo/latest/_sections/backends.html.md)
- [vLLM](https://docs.nvidia.com/dynamo/latest/backends/vllm/README.html.md)
- [SGLang](https://docs.nvidia.com/dynamo/latest/backends/sglang/README.html.md)
- [TensorRT-LLM](https://docs.nvidia.com/dynamo/latest/backends/trtllm/README.html.md)
- [Router](https://docs.nvidia.com/dynamo/latest/router/README.html.md)
- [Planner](https://docs.nvidia.com/dynamo/latest/planner/planner_intro.html.md)
- [SLA Planner Quick Start](https://docs.nvidia.com/dynamo/latest/planner/sla_planner_quickstart.html.md)
- [SLA-Driven Profiling](https://docs.nvidia.com/dynamo/latest/benchmarks/sla_driven_profiling.html)
- [SLA-based Planner](https://docs.nvidia.com/dynamo/latest/planner/sla_planner.html.md)
- [KVBM](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html.md)
- [Motivation](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_motivation.html.md)
- [Architecture](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_architecture.html.md)
- [Components](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_components.html.md)
- [Design Deep Dive](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_design_deepdive.html.md)
- [Integrations](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_integrations.html.md)
- [KVBM in vLLM](https://docs.nvidia.com/dynamo/latest/kvbm/vllm-setup.html.md)
- [KVBM in TRTLLM](https://docs.nvidia.com/dynamo/latest/kvbm/trtllm-setup.html.md)
- [LMCache Integration](https://docs.nvidia.com/dynamo/latest/backends/vllm/LMCache_Integration.html.md)
- [Further Reading](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_reading.html.md)
- [Overall Architecture](https://docs.nvidia.com/dynamo/latest/design_docs/architecture.html.md)
- [Architecture Flow](https://docs.nvidia.com/dynamo/latest/design_docs/dynamo_flow.html.md)
- [Disaggregated Serving](https://docs.nvidia.com/dynamo/latest/design_docs/disagg_serving.html.md)
- [Distributed Runtime](https://docs.nvidia.com/dynamo/latest/design_docs/distributed_runtime.html.md)
- [#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#learn-more)
- [AIConfigurator](https://github.com/ai-dynamo/aiconfigurator/tree/main)
- [Dynamo Installation Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/kubernetes/installation_guide.md)
- [SLA Planner Quick Start Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/planner/sla_planner_quickstart.md)
- [Benchmarking Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/benchmarks/benchmarking.md)
- [](https://www.nvidia.com/)
- [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy.md/)
- [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center.md/)
- [Do Not Sell or Share My Data](https://www.nvidia.com/en-us/preferences/start.md/)
- [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service.md/)
- [Accessibility](https://www.nvidia.com/en-us/about-nvidia/accessibility.md/)
- [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies.md/)
- [Product Security](https://www.nvidia.com/en-us/product-security.md/)
- [Contact](https://www.nvidia.com/en-us/contact/)