For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
MLAAttnKey
MLAAttnKey
class max.nn.kv_cache.MLAAttnKey(batch_size, max_prompt_length, num_partitions)
Bases: AttnKey
Decode dispatch key for multi-latent attention (MLA).
pack_into_buffer()
pack_into_buffer(device, max_cache_valid_length)
Packs this key into a kernel dispatch-metadata buffer.
max_cache_valid_length is the runtime cache length; it is supplied
here rather than stored on the key so the key’s identity is independent
of it.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!