Finding Best Initial Configs using AIConfigurator — NVIDIA Dynamo Documentation
Title: Finding Best Initial Configs using AIConfigurator#
URL Source: https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html
Published Time: Fri, 07 Nov 2025 17:51:56 GMT
Markdown Content: AIConfigurator is a performance optimization tool that helps you find the optimal configuration for deploying LLMs with Dynamo. It automatically determines the best number of prefill and decode workers, parallelism settings, and deployment parameters to meet your SLA targets while maximizing throughput.
Why Use AIConfigurator?#
When deploying LLMs with Dynamo, you need to make several critical decisions:
-
Aggregated vs Disaggregated: Which architecture gives better performance for your workload?
-
Worker Configuration: How many prefill and decode workers to deploy?
-
Parallelism Settings: What tensor/pipeline parallel configuration to use?
-
SLA Compliance: How to meet your TTFT and TPOT targets?
AIConfigurator answers these questions in seconds, providing:
-
Optimal configurations that meet your SLA requirements
-
Ready-to-deploy Dynamo configuration files
-
Performance comparisons between different deployment strategies
-
Up to 1.7x better throughput compared to manual configuration
Quick Start#
Install
pip3 install aiconfigurator
Find optimal configuration
aiconfigurator cli default
--model QWEN3_32B \ # Model name (QWEN3_32B, LLAMA3.1_70B, etc.)
--total_gpus 32 \ # Number of available GPUs
--system h200_sxm \ # GPU type (h100_sxm, h200_sxm, a100_sxm)
--isl 4000 \ # Input sequence length (tokens)
--osl 500 \ # Output sequence length (tokens)
--ttft 300 \ # Target Time To First Token (ms)
--tpot 10 \ # Target Time Per Output Token (ms)
--save_dir ./dynamo-configs
Deploy
kubectl apply -f ./dynamo-configs/disagg/top1/disagg/k8s_deploy.yaml
Example Output#
-
Dynamo aiconfigurator Final Results *
Input Configuration & SLA Target: Model: QWEN3_32B (is_moe: False) Total GPUs: 32 Best Experiment Chosen: disagg at 812.92 tokens/s/gpu (1.70x better)
Overall Best Configuration: - Best Throughput: 812.92 tokens/s/gpu - User Throughput: 120.23 tokens/s/user - TTFT: 276.76ms - TPOT: 8.32ms
Pareto Frontier: QWEN3_32B Pareto Frontier: tokens/s/gpu vs tokens/s/user ┌────────────────────────────────────────────────────────────────────────┐ 1600.0┤ •• disagg │ │ ff agg │ │ xx disagg best │ │ │ 1333.3┤ f │ │ ff │ │ ff • │ │ f •••••••• │ 1066.7┤ f •• │ │ fff •••••••• │ │ f •• │ │ f •••• │ 800.0┤ fffff •••x │ │ fff •• │ │ fff • │ │ fffff •• │ 533.3┤ ffff •• │ │ ffff •• │ │ fffffff ••••• │ │ ffffff •• │ 266.7┤ fffff ••••••••• │ │ ffffffffff │ │ f │ │ │ 0.0┤ │ └┬─────────────────┬─────────────────┬────────────────┬─────────────────┬┘ 0 60 120 180 240 tokens/s/gpu tokens/s/user
- Performance Comparison: Shows disaggregated vs aggregated serving performance
- Optimal Configuration: The best configuration that meets your SLA targets
- Deployment Files: Ready-to-use Dynamo configuration files
Key Features
Fast Profiling Integration
# Use with Dynamo's SLA planner (20-30 seconds vs hours) python3 -m benchmarks.profiler.profile_sla \ --config ./components/backends/trtllm/deploy/disagg.yaml \ --backend trtllm \ --use-ai-configurator \ --aic-system h200_sxm \ --aic-model-name QWEN3_32B ### Custom Configuration[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#custom-configuration "Link to this heading") # For advanced users: define custom search space aiconfigurator cli exp --yaml_path custom_config.yaml Common Use Cases[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#common-use-cases "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------- # Strict SLAs (low latency) aiconfigurator cli default --model QWEN2.5_7B --total_gpus 8 --system h200_sxm --ttft 100 --tpot 5 # High throughput (relaxed latency) aiconfigurator cli default --model QWEN3_32B --total_gpus 32 --system h200_sxm --ttft 1000 --tpot 50 Supported Configurations[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#supported-configurations "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- **Models**: GPT, LLAMA2/3, QWEN2.5/3, Mixtral, DEEPSEEK_V3 **GPUs**: H100, H200, A100, B200 (preview), GB200 (preview) **Backend**: TensorRT-LLM (vLLM and SGLang coming soon) Additional Options[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#additional-options "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------- # Web interface aiconfigurator webapp # Visit http://127.0.0.1:7860 # Docker docker run -it --rm nvcr.io/nvidia/aiconfigurator:latest \ aiconfigurator cli default --model LLAMA3.1_70B --total_gpus 16 --system h100_sxm Troubleshooting[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#troubleshooting "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------- **Model name mismatch**: Use exact model name that matches your deployment **GPU allocation**: Verify available GPUs match `--total_gpus`**Performance variance**: Results are estimates - benchmark actual deployment Learn More[#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#learn-more "Link to this heading") ---------------------------------------------------------------------------------------------------------------------- * [Dynamo Installation Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/kubernetes/installation_guide.md) * [SLA Planner Quick Start Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/planner/sla_planner_quickstart.md) * [Benchmarking Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/benchmarks/benchmarking.md) Links/Buttons: - [Skip to main content](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#main-content) - [NVIDIA Dynamo Documentation](https://docs.nvidia.com/dynamo/latest/index.html.md) - [latest](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md) - [0.6.1](https://docs.nvidia.com/dynamo/archive/0.6.1/performance/aiconfigurator.html) - [0.6.0](https://docs.nvidia.com/dynamo/archive/0.6.0/performance/aiconfigurator.html.md) - [0.5.1](https://docs.nvidia.com/dynamo/archive/0.5.1/performance/aiconfigurator.html) - [0.5.0](https://docs.nvidia.com/dynamo/archive/0.5.0/performance/aiconfigurator.html) - [0.4.1](https://docs.nvidia.com/dynamo/archive/0.4.1/performance/aiconfigurator.html) - [0.4.0](https://docs.nvidia.com/dynamo/archive/0.4.0/performance/aiconfigurator.html) - [0.3.2](https://docs.nvidia.com/dynamo/archive/0.3.2/performance/aiconfigurator.html) - [0.3.1](https://docs.nvidia.com/dynamo/archive/0.3.1/performance/aiconfigurator.html) - [0.3.0](https://docs.nvidia.com/dynamo/archive/0.3.0/performance/aiconfigurator.html) - [0.2.1](https://docs.nvidia.com/dynamo/archive/0.2.1/performance/aiconfigurator.html) - [0.2.0](https://docs.nvidia.com/dynamo/archive/0.2.0/performance/aiconfigurator.html) - [GitHub](https://github.com/ai-dynamo/dynamo) - [Installation](https://docs.nvidia.com/dynamo/latest/_sections/installation.html.md) - [Support Matrix](https://docs.nvidia.com/dynamo/latest/reference/support-matrix.html.md) - [Examples](https://docs.nvidia.com/dynamo/latest/_sections/examples.html.md) - [Deployment Guide](https://docs.nvidia.com/dynamo/latest/_sections/k8s_deployment.html.md) - [Kubernetes Quickstart](https://docs.nvidia.com/dynamo/latest/kubernetes/README.html.md) - [Detailed Installation Guide](https://docs.nvidia.com/dynamo/latest/kubernetes/installation_guide.html.md) - [Dynamo Operator](https://docs.nvidia.com/dynamo/latest/kubernetes/dynamo_operator.html.md) - [Minikube Setup](https://docs.nvidia.com/dynamo/latest/kubernetes/deployment/minikube.html.md) - [Observability (K8s)](https://docs.nvidia.com/dynamo/latest/_sections/k8s_observability.html.md) - [Metrics](https://docs.nvidia.com/dynamo/latest/observability/metrics.html.md) - [Logging](https://docs.nvidia.com/dynamo/latest/observability/logging.html.md) - [Multinode](https://docs.nvidia.com/dynamo/latest/_sections/k8s_multinode.html.md) - [Multinode Deployments](https://docs.nvidia.com/dynamo/latest/kubernetes/deployment/multinode-deployment.html.md) - [Grove](https://docs.nvidia.com/dynamo/latest/kubernetes/grove.html.md) - [Tool Calling](https://docs.nvidia.com/dynamo/latest/agents/tool-calling.html.md) - [Multimodality Support](https://docs.nvidia.com/dynamo/latest/multimodal/multimodal_intro.html.md) - [Finding Best Initial Configs](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#) - [Dynamo Benchmarking Guide](https://docs.nvidia.com/dynamo/latest/benchmarks/benchmarking.html.md) - [Tuning Disaggregated Performance](https://docs.nvidia.com/dynamo/latest/performance/tuning.html.md) - [Writing Python Workers in Dynamo](https://docs.nvidia.com/dynamo/latest/development/backend-guide.html.md) - [Observability (Local)](https://docs.nvidia.com/dynamo/latest/_sections/observability.html.md) - [Metrics Visualization with Prometheus and Grafana](https://docs.nvidia.com/dynamo/latest/observability/prometheus-grafana.html) - [Health Checks](https://docs.nvidia.com/dynamo/latest/observability/health-checks.html.md) - [Glossary](https://docs.nvidia.com/dynamo/latest/reference/glossary.html.md) - [Backends](https://docs.nvidia.com/dynamo/latest/_sections/backends.html.md) - [vLLM](https://docs.nvidia.com/dynamo/latest/backends/vllm/README.html.md) - [SGLang](https://docs.nvidia.com/dynamo/latest/backends/sglang/README.html.md) - [TensorRT-LLM](https://docs.nvidia.com/dynamo/latest/backends/trtllm/README.html.md) - [Router](https://docs.nvidia.com/dynamo/latest/router/README.html.md) - [Planner](https://docs.nvidia.com/dynamo/latest/planner/planner_intro.html.md) - [SLA Planner Quick Start](https://docs.nvidia.com/dynamo/latest/planner/sla_planner_quickstart.html.md) - [SLA-Driven Profiling](https://docs.nvidia.com/dynamo/latest/benchmarks/sla_driven_profiling.html) - [SLA-based Planner](https://docs.nvidia.com/dynamo/latest/planner/sla_planner.html.md) - [KVBM](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html.md) - [Motivation](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_motivation.html.md) - [Architecture](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_architecture.html.md) - [Components](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_components.html.md) - [Design Deep Dive](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_design_deepdive.html.md) - [Integrations](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_integrations.html.md) - [KVBM in vLLM](https://docs.nvidia.com/dynamo/latest/kvbm/vllm-setup.html.md) - [KVBM in TRTLLM](https://docs.nvidia.com/dynamo/latest/kvbm/trtllm-setup.html.md) - [LMCache Integration](https://docs.nvidia.com/dynamo/latest/backends/vllm/LMCache_Integration.html.md) - [Further Reading](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_reading.html.md) - [Overall Architecture](https://docs.nvidia.com/dynamo/latest/design_docs/architecture.html.md) - [Architecture Flow](https://docs.nvidia.com/dynamo/latest/design_docs/dynamo_flow.html.md) - [Disaggregated Serving](https://docs.nvidia.com/dynamo/latest/design_docs/disagg_serving.html.md) - [Distributed Runtime](https://docs.nvidia.com/dynamo/latest/design_docs/distributed_runtime.html.md) - [#](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#learn-more) - [AIConfigurator](https://github.com/ai-dynamo/aiconfigurator/tree/main) - [Dynamo Installation Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/kubernetes/installation_guide.md) - [SLA Planner Quick Start Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/planner/sla_planner_quickstart.md) - [Benchmarking Guide](https://docs.nvidia.com/dynamo/latest/performance/aiconfigurator.html.md#/docs/benchmarks/benchmarking.md) - [](https://www.nvidia.com/) - [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy.md/) - [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center.md/) - [Do Not Sell or Share My Data](https://www.nvidia.com/en-us/preferences/start.md/) - [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service.md/) - [Accessibility](https://www.nvidia.com/en-us/about-nvidia/accessibility.md/) - [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies.md/) - [Product Security](https://www.nvidia.com/en-us/product-security.md/) - [Contact](https://www.nvidia.com/en-us/contact/)