For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

AttnKey

`AttnKey`

class max.nn.kv_cache.AttnKey(batch_size, max_prompt_length, num_partitions)

source

Bases: object

A resolved decode-attention dispatch shape.

The resolved num_partitions (the kernel grid) plus the batch and prompt dimensions. The runtime max_cache_valid_length is supplied to pack_into_buffer() rather than stored, so dispatches that differ only in cache length share one key. Concrete subclasses (MHAAttnKey, MLAAttnKey) implement the kernel-specific buffer layout.

Parameters:

batch_size (int)
max_prompt_length (int)
num_partitions (int)

`batch_size`

batch_size: int

source

`max_prompt_length`

max_prompt_length: int

source

`num_partitions`

num_partitions: int

source

`pack_into_buffer()`

pack_into_buffer(device, max_cache_valid_length)

source

Packs this key into a kernel dispatch-metadata buffer.

max_cache_valid_length is the runtime cache length; it is supplied here rather than stored on the key so the key’s identity is independent of it.

Parameters:

device (Device)
max_cache_valid_length (int)

Return type:

Buffer

AttnKey​

batch_size​

max_prompt_length​

num_partitions​

pack_into_buffer()​

`AttnKey`

`batch_size`

`max_prompt_length`

`num_partitions`

`pack_into_buffer()`