IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

KVCacheMemory

KVCacheMemoryโ€‹

class max.nn.kv_cache.KVCacheMemory(buffer)

source

Bases: object

A single KV cache shard as a 2-D uint8 view.

buffer has shape [num_pages, bytes_per_page] with dtype uint8. This is the form consumed by the offload engine and KV connectors. ReplicatedKVCacheMemory subclasses this for caches that are replicated across TP shards (MLA).

Parameters:

buffer (Buffer)

bufferโ€‹

buffer: Buffer

source

total_num_pagesโ€‹

property total_num_pages: int

source

Returns the total number of pages.