For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
AttnKey
AttnKey
class max.nn.kv_cache.AttnKey(batch_size, max_prompt_length, num_partitions)
Bases: object
A resolved decode-attention dispatch shape.
The resolved num_partitions (the kernel grid) plus the batch and prompt
dimensions. The runtime max_cache_valid_length is supplied to
pack_into_buffer() rather than stored, so dispatches that differ only
in cache length share one key. Concrete subclasses (MHAAttnKey,
MLAAttnKey) implement the kernel-specific buffer layout.
batch_size
batch_size: int
max_prompt_length
max_prompt_length: int
num_partitions
num_partitions: int
pack_into_buffer()
pack_into_buffer(device, max_cache_valid_length)
Packs this key into a kernel dispatch-metadata buffer.
max_cache_valid_length is the runtime cache length; it is supplied
here rather than stored on the key so the key’s identity is independent
of it.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!