For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
AttnKey
AttnKeyβ
class max.nn.kv_cache.AttnKey(batch_size, max_prompt_length, num_partitions)
Bases: AttnKeyInterface
A resolved decode-attention dispatch shape.
The resolved num_partitions (the kernel grid) plus the batch and prompt
dimensions. The runtime max_cache_valid_length is supplied to
pack_into_buffer() rather than stored, so dispatches that differ only
in cache length share one identity. Concrete subclasses
(MHAAttnKey, MLAAttnKey)
implement the kernel-specific buffer layout.
batch_sizeβ
batch_size: int
max_prompt_lengthβ
max_prompt_length: int
num_partitionsβ
num_partitions: int
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!