For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

KVCacheBuffer

`KVCacheBuffer`

class max.nn.kv_cache.KVCacheBuffer(replicates_kv_across_tp, values, scales=None)

source

Bases: object

A collection of KVCache buffers for one data-parallel replica.

Two buffer kinds are supported: values and (optionally, for FP8 quantization) scales. The length of each list corresponds to the tensor-parallel degree, with one buffer per TP shard.

page_size and replicates_kv_across_tp describe the physical layout so KV connectors can offload this cache without a separate KVCacheParams reference: replicates_kv_across_tp is True when the KV data is replicated identically across TP shards (MLA) and False when it is sharded (MHA).

Parameters:

replicates_kv_across_tp (bool)
values (list[Buffer])
scales (list[Buffer] | None)

`all_buffers`

property all_buffers: list[Buffer]

source

Returns all value and scale buffers in a single flat list.

Returns:: A list containing every value buffer followed by every scale buffer (if scales are present).

`replicates_kv_across_tp`

replicates_kv_across_tp: bool

source

`scales`

scales: list[Buffer] | None = None

source

`to_memory()`

to_memory()

source

Convert to a flat list of offload-ready memory units.

Each unit covers one buffer kind (values or scales) and one logical TP group. Non-replicated shards become individual KVCacheMemory entries; replicated shards become one ReplicatedKVCacheMemory entry (root + peers).

Every buffer is re-viewed as a 2-D [num_pages, bytes_per_page] uint8 array so the offload engine can treat all caches uniformly regardless of original dtype or shape.

Returns:: A list of memory units ready for use by KV connectors and the offload engine.
Return type:: list[KVCacheMemory]

`total_num_pages`

property total_num_pages: int

source

Returns the total number of pages across all values and scales.

`values`

values: list[Buffer]

source

KVCacheBuffer​

all_buffers​

replicates_kv_across_tp​

scales​

to_memory()​

total_num_pages​

values​