For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python module

max.pipelines.architectures.deepseekV3_2

DeepSeek-V3.2 mixture-of-experts architecture for text generation.

`DeepseekV3_2Config`

class max.pipelines.architectures.deepseekV3_2.DeepseekV3_2Config(*, dtype, kv_params, devices, use_subgraphs=True, data_parallel_degree=1, vocab_size=129280, hidden_size=7168, intermediate_size=18432, moe_intermediate_size=2048, moe_layer_freq=1, num_hidden_layers=61, num_attention_heads=128, num_key_value_heads=128, n_shared_experts=1, n_routed_experts=256, routed_scaling_factor=2.5, kv_lora_rank=512, q_lora_rank=1536, qk_rope_head_dim=64, v_head_dim=128, qk_nope_head_dim=128, topk_method='greedy', n_group=8, topk_group=4, num_experts_per_tok=8, first_k_dense_replace=3, norm_topk_prob=True, hidden_act='silu', max_position_embeddings=4096, max_seq_len=163840, rms_norm_eps=1e-06, tie_word_embeddings=False, rope_theta=10000.0, rope_scaling=None, rope_interleave=True, scoring_func='sigmoid', attention_bias=False, attention_dropout=0.0, norm_dtype=bfloat16, gate_dtype=None, correction_bias_dtype=None, max_batch_context_length=131072, quant_config=None, dense_mlp_layers_without_quant=frozenset({}), ep_config=None, graph_mode='auto', return_logits=ReturnLogits.LAST_TOKEN, return_hidden_states=ReturnHiddenStates.NONE, eagle_aux_hidden_state_layer_ids=None, eplb_profile_enabled=False, index_head_dim=128, index_n_heads=64, index_topk=2048, indexer_types=<factory>)

source

Bases: DeepseekV3Config

Configuration for DeepseekV3.2 models.

Parameters:

dtype (DType)
kv_params (KVCacheParamInterface)
devices (list[DeviceRef])
use_subgraphs (bool)
data_parallel_degree (int)
vocab_size (int)
hidden_size (int)
intermediate_size (int)
moe_intermediate_size (int)
moe_layer_freq (int)
num_hidden_layers (int)
num_attention_heads (int)
num_key_value_heads (int)
n_shared_experts (int)
n_routed_experts (int)
routed_scaling_factor (float)
kv_lora_rank (int)
q_lora_rank (int)
qk_rope_head_dim (int)
v_head_dim (int)
qk_nope_head_dim (int)
topk_method (str)
n_group (int)
topk_group (int)
num_experts_per_tok (int)
first_k_dense_replace (int)
norm_topk_prob (bool)
hidden_act (str)
max_position_embeddings (int)
max_seq_len (int)
rms_norm_eps (float)
tie_word_embeddings (bool)
rope_theta (float)
rope_scaling (dict[str, Any] | None)
rope_interleave (bool)
scoring_func (str)
attention_bias (bool)
attention_dropout (float)
norm_dtype (DType)
gate_dtype (DType | None)
correction_bias_dtype (DType | None)
max_batch_context_length (int)
quant_config (QuantConfig | None)
dense_mlp_layers_without_quant (frozenset[int])
ep_config (EPConfig | None)
graph_mode (str)
return_logits (ReturnLogits)
return_hidden_states (ReturnHiddenStates)
eagle_aux_hidden_state_layer_ids (list[int] | None)
eplb_profile_enabled (bool)
index_head_dim (int)
index_n_heads (int)
index_topk (int)
indexer_types (list[str])

`construct_kv_params()`

static construct_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)

source

Parameters:

huggingface_config (AutoConfig)
pipeline_config (PipelineConfig)
devices (list[DeviceRef])
kv_cache_config (KVCacheConfig)
cache_dtype (DType)

Return type:

KVCacheParamInterface

`index_head_dim`

index_head_dim: int = 128

source

`index_n_heads`

index_n_heads: int = 64

source

`index_topk`

index_topk: int = 2048

source

`indexer_types`

indexer_types: list[str]

source

`initialize()`

classmethod initialize(pipeline_config, model_config=None)

source

Initializes a DeepseekV3_2Config instance from pipeline configuration.

This method creates a config instance with all fields that can be determined from the pipeline configuration, without needing the state_dict. Fields that depend on the state_dict (like norm_dtype, quant_config, etc.) should be set directly after calling this method.

Parameters:

pipeline_config (PipelineConfig) – The MAX Engine pipeline configuration.
model_config (MAXModelConfig | None)

Returns:

An initialized DeepseekV3_2Config instance.

Return type:

Self

`DeepseekV3_2Model`

class max.pipelines.architectures.deepseekV3_2.DeepseekV3_2Model(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.ALL, return_hidden_states=ReturnHiddenStates.NONE, max_batch_size=1)

source

Bases: DeepseekV3Model

A DeepseekV3.2 model.

Parameters:

pipeline_config (PipelineConfig)
session (InferenceSession)
devices (list[Device])
kv_cache_config (KVCacheConfig)
weights (Weights)
adapter (WeightsAdapter | None)
return_logits (ReturnLogits)
return_hidden_states (ReturnHiddenStates)
max_batch_size (int)

`model_config_cls`

model_config_cls

source

alias of DeepseekV3_2Config

DeepseekV3_2Config​

construct_kv_params()​

index_head_dim​

index_n_heads​

index_topk​

indexer_types​

initialize()​

DeepseekV3_2Model​

model_config_cls​

`DeepseekV3_2Config`

`construct_kv_params()`

`index_head_dim`

`index_n_heads`

`index_topk`

`indexer_types`

`initialize()`

`DeepseekV3_2Model`

`model_config_cls`