KV Block Manager — NVIDIA Dynamo Documentation
Title: KV Block Manager — NVIDIA Dynamo Documentation
URL Source: https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html
Published Time: Fri, 07 Nov 2025 17:51:23 GMT
Markdown Content: KV Block Manager#
The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to handle memory allocation, management, and remote sharing of Key-Value (KV) blocks for inference tasks across heterogeneous and distributed environments. It acts as a unified memory layer for frameworks like vLLM, SGLang, and TRT-LLM.
It offers:
-
A unified memory API that spans GPU memory(in future) , pinned host memory, remote RDMA-accessible memory, local or distributed pool of SSDs and remote file/object/cloud storage systems.
-
Support for evolving block lifecycles (allocate → register → match) with event-based state transitions that storage can subscribe to.
-
Integration with NIXL, a dynamic memory exchange layer used for remote registration, sharing, and access of memory blocks over RDMA/NVLink.
The Dynamo KV Block Manager serves as a reference implementation that emphasizes modularity and extensibility. Its pluggable design enables developers to customize components and optimize for specific performance, memory, and deployment needs.
| Feature | ||
|---|---|---|
| Backend | ✅ | Local |
| ✅ | Kubernetes | |
| LLM Framework | ✅ | vLLM |
| ✅ | TensorRT-LLM | |
| ❌ | SGLang | |
| Serving Type | ✅ | Aggregated |
| ✅ | Disaggregated |
Links/Buttons: