For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

AttnKey

`AttnKey`

class max.nn.kv_cache.AttnKey(batch_size, max_prompt_length, num_partitions)

source

Bases: AttnKeyInterface

A resolved decode-attention dispatch shape.

The resolved num_partitions (the kernel grid) plus the batch and prompt dimensions. The runtime max_cache_valid_length is supplied to pack_into_buffer() rather than stored, so dispatches that differ only in cache length share one identity. Concrete subclasses (MHAAttnKey, MLAAttnKey) implement the kernel-specific buffer layout.

Parameters:

batch_size (int)
max_prompt_length (int)
num_partitions (int)

`batch_size`

batch_size: int

source

`max_prompt_length`

max_prompt_length: int

source

`num_partitions`

num_partitions: int

source

AttnKey​

batch_size​

max_prompt_length​

num_partitions​

`AttnKey`

`batch_size`

`max_prompt_length`

`num_partitions`