IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

KVCacheBuffer

KVCacheBuffer​

class max.nn.kv_cache.KVCacheBuffer(replicates_kv_across_tp, values, scales=None)

source

Bases: object

A collection of KVCache buffers for one data-parallel replica.

Two buffer kinds are supported: values and (optionally, for FP8 quantization) scales. The length of each list corresponds to the tensor-parallel degree, with one buffer per TP shard.

page_size and replicates_kv_across_tp describe the physical layout so KV connectors can offload this cache without a separate KVCacheParams reference: replicates_kv_across_tp is True when the KV data is replicated identically across TP shards (MLA) and False when it is sharded (MHA).

Parameters:

all_buffers​

property all_buffers: list[Buffer]

source

Returns all value and scale buffers in a single flat list.

Returns:

A list containing every value buffer followed by every scale buffer (if scales are present).

replicates_kv_across_tp​

replicates_kv_across_tp: bool

source

scales​

scales: list[Buffer] | None = None

source

to_memory()​

to_memory()

source

Convert to a flat list of offload-ready memory units.

Each unit covers one buffer kind (values or scales) and one logical TP group. Non-replicated shards become individual KVCacheMemory entries; replicated shards become one ReplicatedKVCacheMemory entry (root + peers).

Every buffer is re-viewed as a 2-D [num_pages, bytes_per_page] uint8 array so the offload engine can treat all caches uniformly regardless of original dtype or shape.

Returns:

A list of memory units ready for use by KV connectors and the offload engine.

Return type:

list[KVCacheMemory]

total_num_pages​

property total_num_pages: int

source

Returns the total number of pages across all values and scales.

values​

values: list[Buffer]

source