IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

AttnKey

AttnKey​

class max.nn.kv_cache.AttnKey(batch_size, max_prompt_length, num_partitions)

source

Bases: AttnKeyInterface

A resolved decode-attention dispatch shape.

The resolved num_partitions (the kernel grid) plus the batch and prompt dimensions. The runtime max_cache_valid_length is supplied to pack_into_buffer() rather than stored, so dispatches that differ only in cache length share one identity. Concrete subclasses (MHAAttnKey, MLAAttnKey) implement the kernel-specific buffer layout.

Parameters:

  • batch_size (int)
  • max_prompt_length (int)
  • num_partitions (int)

batch_size​

batch_size: int

source

max_prompt_length​

max_prompt_length: int

source

num_partitions​

num_partitions: int

source