For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
ArchConfigWithStoredKVParams
ArchConfigWithStoredKVParamsβ
class max.pipelines.lib.interfaces.ArchConfigWithStoredKVParams
Bases: ArchConfigWithBoundedMaxSeqLen
Mixin that implements get_kv_params() as the kv_params field.
Architecture dataclasses that precompute KVCacheParams
(or another KVCacheParamInterface) during initialize can inherit
this mixin together with ArchConfigWithKVCache to avoid duplicating
the trivial accessor.
Also provides a default construct_kv_params() for the common grouped
attention case. Speculative decoding defaults to None via
KVCacheConfig.to_params() unless a subclass (e.g. Llama3) passes a
nonzero num_draft_tokens. Configs that need a different head/layer
mapping or MLA should override construct_kv_params.
construct_kv_params()β
classmethod construct_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)
Default KV params for standard grouped attention.
-
Parameters:
-
- huggingface_config (AutoConfig)
- pipeline_config (PipelineConfig)
- devices (list[DeviceRef])
- kv_cache_config (KVCacheConfig)
- cache_dtype (DType)
-
Return type:
get_head_dim()β
static get_head_dim(huggingface_config)
Attention head size from head_dim or hidden_size // num_attention_heads.
-
Parameters:
-
huggingface_config (AutoConfig)
-
Return type:
get_kv_params()β
get_kv_params()
Returns the KV cache parameters computed for this config.
-
Return type:
get_num_layers()β
static get_num_layers(huggingface_config)
Layer count for the decoder stack (override when HF uses a different field).
-
Parameters:
-
huggingface_config (AutoConfig)
-
Return type:
kv_paramsβ
kv_params: KVCacheParams
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!