For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

MLAAttnKey

`MLAAttnKey`

class max.nn.kv_cache.MLAAttnKey(batch_size, max_prompt_length, num_partitions)

source

Bases: AttnKey

Decode dispatch key for multi-latent attention (MLA).

Parameters:

batch_size (int)
max_prompt_length (int)
num_partitions (int)

`pack_into_buffer()`

pack_into_buffer(device, max_cache_valid_length)

source

Packs this key into a kernel dispatch-metadata buffer.

max_cache_valid_length is the runtime cache length; it is supplied here rather than stored on the key so the key’s identity is independent of it.

Parameters:

device (Device)
max_cache_valid_length (int)

Return type:

Buffer

MLAAttnKey​

pack_into_buffer()​

`MLAAttnKey`

`pack_into_buffer()`