Skip to main content

Python module

max.pipelines.kv_cache

KV cache management for MAX pipelines.

Cache managerโ€‹

DummyKVCacheNo-op KV cache implementation for testing or when cache is disabled.
IncrementCacheLengthsProcessorProcesses KV cache length increments after each decoding step.
InsufficientBlocksErrorException raised when there are insufficient free blocks to satisfy an allocation.
PagedKVCacheManagerPaged KVCache manager with data and tensor parallelism support.

Transfer engineโ€‹

KVTransferEngineKVCache Transfer Engine with support for Data Parallelism (DP) and Tensor Parallelism (TP).
KVTransferEngineMetadataMetadata associated with a transfer engine.
TransferReqDataMetadata associated with a transfer request.

Factory functionsโ€‹

available_portFinds an available TCP port in the given range.
load_kv_managerLoads a KV cache manager from the given params.
load_multi_kv_managersLoads a list of KV cache managers from the given params.