For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

ArchConfigWithStoredKVParams

`ArchConfigWithStoredKVParams`

class max.pipelines.lib.interfaces.ArchConfigWithStoredKVParams

source

Bases: ArchConfigWithBoundedMaxSeqLen

Mixin that implements get_kv_params() as the kv_params field.

Architecture dataclasses that precompute KVCacheParams (or another KVCacheParamInterface) during initialize can inherit this mixin together with ArchConfigWithKVCache to avoid duplicating the trivial accessor.

Also provides a default construct_kv_params() for the common grouped attention case. Speculative decoding defaults to None via KVCacheConfig.to_params() unless a subclass (e.g. Llama3) passes a nonzero num_draft_tokens. Configs that need a different head/layer mapping or MLA should override construct_kv_params.

`construct_kv_params()`

classmethod construct_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)

source

Default KV params for standard grouped attention.

Parameters:

huggingface_config (AutoConfig)
pipeline_config (PipelineConfig)
devices (list[DeviceRef])
kv_cache_config (KVCacheConfig)
cache_dtype (DType)

Return type:

KVCacheParams

`get_head_dim()`

static get_head_dim(huggingface_config)

source

Attention head size from head_dim or hidden_size // num_attention_heads.

Parameters:: huggingface_config (AutoConfig)
Return type:: int

`get_kv_params()`

get_kv_params()

source

Returns the KV cache parameters computed for this config.

Return type:: KVCacheParams

`get_num_layers()`

static get_num_layers(huggingface_config)

source

Layer count for the decoder stack (override when HF uses a different field).

Parameters:: huggingface_config (AutoConfig)
Return type:: int

`kv_params`

kv_params: KVCacheParams

source

ArchConfigWithStoredKVParams​

construct_kv_params()​

get_head_dim()​

get_kv_params()​

get_num_layers()​

kv_params​