KVBM Integrations — NVIDIA Dynamo Documentation

Last updated: 11/7/2025

Title: KVBM Integrations — NVIDIA Dynamo Documentation

URL Source: https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_integrations.html

Published Time: Fri, 07 Nov 2025 17:51:23 GMT

Markdown Content: KVBM Integrations#

KVBM Integrates with Inference frameworks (vLLM, TRTLLM, SGLang) via Connector APIs to influence KV caching behaviour, scheduling, and forward pass execution. There are two components of the interface, Scheduler and Worker. Scheduler(leader) is responsible for the orchestration of KV block offload/onboard, builds metadata specifying transfer data to the workers. It also maintains hooks for handling asynchronous transfer completion. Worker is responsible for reading metadata built by the scheduler(leader), does async onboarding/ offloading at the end of the forward pass.

Typical KVBM Integrations#

The following figure shows the typical integration of KVBM with inference frameworks (vLLM used as an example)

Image 1: vLLM KVBM Integration vLLM KVBM Integration

How to run KVBM with Frameworks#

Onboarding#

Image 2: Onboarding blocks from Host to DeviceOnboarding blocks from Host to DeviceImage 3: Onboarding blocks from Disk to DeviceOnboarding blocks from Disk to Device

Offloading#

Image 4: Offloading blocks from Device to Host&DiskOffloading blocks from Device to Host&Disk

Links/Buttons: