IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

AttnKey

AttnKey

class max.nn.kv_cache.AttnKey(batch_size, max_prompt_length, num_partitions)

source

Bases: object

A resolved decode-attention dispatch shape.

The resolved num_partitions (the kernel grid) plus the batch and prompt dimensions. The runtime max_cache_valid_length is supplied to pack_into_buffer() rather than stored, so dispatches that differ only in cache length share one key. Concrete subclasses (MHAAttnKey, MLAAttnKey) implement the kernel-specific buffer layout.

Parameters:

  • batch_size (int)
  • max_prompt_length (int)
  • num_partitions (int)

batch_size

batch_size: int

source

max_prompt_length

max_prompt_length: int

source

num_partitions

num_partitions: int

source

pack_into_buffer()

pack_into_buffer(device, max_cache_valid_length)

source

Packs this key into a kernel dispatch-metadata buffer.

max_cache_valid_length is the runtime cache length; it is supplied here rather than stored on the key so the key’s identity is independent of it.

Parameters:

  • device (Device)
  • max_cache_valid_length (int)

Return type:

Buffer