IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

ArchConfigWithStoredKVParams

ArchConfigWithStoredKVParams​

class max.pipelines.lib.interfaces.ArchConfigWithStoredKVParams

source

Bases: ArchConfigWithBoundedMaxSeqLen

Mixin that implements get_kv_params() as the kv_params field.

Architecture dataclasses that precompute KVCacheParams (or another KVCacheParamInterface) during initialize can inherit this mixin together with ArchConfigWithKVCache to avoid duplicating the trivial accessor.

Also provides a default construct_kv_params() for the common grouped attention case. Speculative decoding defaults to None via KVCacheConfig.to_params() unless a subclass (e.g. Llama3) passes a nonzero num_draft_tokens. Configs that need a different head/layer mapping or MLA should override construct_kv_params.

construct_kv_params()​

classmethod construct_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)

source

Default KV params for standard grouped attention.

Parameters:

Return type:

KVCacheParams

get_head_dim()​

static get_head_dim(huggingface_config)

source

Attention head size from head_dim or hidden_size // num_attention_heads.

Parameters:

huggingface_config (AutoConfig)

Return type:

int

get_kv_params()​

get_kv_params()

source

Returns the KV cache parameters computed for this config.

Return type:

KVCacheParams

get_num_layers()​

static get_num_layers(huggingface_config)

source

Layer count for the decoder stack (override when HF uses a different field).

Parameters:

huggingface_config (AutoConfig)

Return type:

int

kv_params​

kv_params: KVCacheParams

source