For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
KVCacheMemory
KVCacheMemoryโ
class max.nn.kv_cache.KVCacheMemory(buffer)
Bases: object
A single KV cache shard as a 2-D uint8 view.
buffer has shape [num_pages, bytes_per_page] with dtype
uint8. This is the form consumed by the offload engine and KV
connectors. ReplicatedKVCacheMemory subclasses this for
caches that are replicated across TP shards (MLA).
-
Parameters:
-
buffer (Buffer)
bufferโ
buffer: Buffer
total_num_pagesโ
property total_num_pages: int
Returns the total number of pages.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!