Python class
ArchConfigWithAttentionKVCache
ArchConfigWithAttentionKVCache
class max.pipelines.lib.interfaces.ArchConfigWithAttentionKVCache(dtype, devices=<factory>, cache_dtype=None, kv_cache=<factory>, data_parallel_degree=1, user_provided_max_length=None, huggingface_config=None, _kv_params=None)
Bases: ArchConfigWithKVCache, ABC
Predefined configuration for architectures that use attention KV cache blocks.
Subclasses must define the following attributes:
- num_key_value_heads: int
- head_dim: int
- num_layers: int
- model_max_seq_len: int
-
Parameters:
-
- dtype (DType)
- devices (list[DeviceRef])
- cache_dtype (DType | None)
- kv_cache (KVCacheConfig)
- data_parallel_degree (int)
- user_provided_max_length (int | None)
- huggingface_config (AutoConfig | None)
- _kv_params (KVCacheParams | None)
cache_dtype
The data type to use for the KV cache.
data_parallel_degree
data_parallel_degree: int = 1
The data parallel degree to use when running the model.
devices
The physical devices to use when running the model.
dtype
dtype: DType
The data type to use for the model.
get_kv_params()
get_kv_params()
Returns the KV cache parameters for this architecture.
-
Return type:
get_max_seq_len()
get_max_seq_len()
Returns the maximum sequence length the model can process.
Returns max_length if set, otherwise model_max_seq_len.
Raises ValueError if max_length exceeds model_max_seq_len.
-
Return type:
head_dim
abstract property head_dim: int
Dimensionality of each attention head.
huggingface_config
huggingface_config: AutoConfig | None = None
initialize()
classmethod initialize(pipeline_config, model_config=None)
Initialize the config from a PipelineConfig.
-
Parameters:
-
- pipeline_config (PipelineConfig) – The pipeline configuration.
- model_config (MAXModelConfig | None) – The model configuration to read from. When
None(the default),pipeline_config.modelis used. Pass an explicit config (e.g.pipeline_config.draft_model) to initialize the arch config for a different model.
-
Return type:
-
Self
kv_cache
kv_cache: KVCacheConfig
The KV cache configuration to use when running the model.
model_max_seq_len
abstract property model_max_seq_len: int
The maximum sequence length that can be processed by the model.
num_key_value_heads
abstract property num_key_value_heads: int
Number of key-value heads to use for the KV cache.
num_layers
abstract property num_layers: int
Number of hidden layers in the model.
user_provided_max_length
Override for the maximum sequence length.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!