IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python module

max.pipelines.architectures.deepseekV3_2

DeepSeek-V3.2 mixture-of-experts architecture for text generation.

DeepseekV32HFConfig​

class max.pipelines.architectures.deepseekV3_2.DeepseekV32HFConfig(index_head_dim=128, index_n_heads=64, index_topk=2048, **kwargs)

source

Bases: DeepseekV3Config

HuggingFace configuration class for DeepSeek-V3.2 models.

The deepseek_v32 model type is not natively registered in transformers. This subclass of DeepseekV3Config adds the V3.2-specific fields for sparse attention (indexer) and registers itself so that AutoConfig.from_pretrained can load DeepSeek-V3.2 repos.

Parameters:

  • index_head_dim (int)
  • index_n_heads (int)
  • index_topk (int)

model_type​

model_type: str = 'deepseek_v32'

source

DeepseekV3_2Config​

class max.pipelines.architectures.deepseekV3_2.DeepseekV3_2Config(*, dtype, kv_params, devices, use_subgraphs=True, data_parallel_degree=1, vocab_size=129280, hidden_size=7168, intermediate_size=18432, moe_intermediate_size=2048, moe_layer_freq=1, num_hidden_layers=61, num_attention_heads=128, num_key_value_heads=128, n_shared_experts=1, n_routed_experts=256, routed_scaling_factor=2.5, kv_lora_rank=512, q_lora_rank=1536, qk_rope_head_dim=64, v_head_dim=128, qk_nope_head_dim=128, topk_method='greedy', n_group=8, topk_group=4, num_experts_per_tok=8, first_k_dense_replace=3, norm_topk_prob=True, hidden_act='silu', max_position_embeddings=4096, max_seq_len=163840, rms_norm_eps=1e-06, tie_word_embeddings=False, rope_theta=10000.0, rope_scaling=None, rope_interleave=True, scoring_func='sigmoid', attention_bias=False, attention_dropout=0.0, norm_dtype=bfloat16, gate_dtype=None, correction_bias_dtype=None, max_batch_context_length=131072, quant_config=None, dense_mlp_layers_without_quant=frozenset({}), ep_config=None, graph_mode='auto', return_logits=ReturnLogits.LAST_TOKEN, return_hidden_states=ReturnHiddenStates.NONE, eagle_aux_hidden_state_layer_ids=None, index_head_dim=128, index_n_heads=64, index_topk=2048)

source

Bases: DeepseekV3Config

Configuration for DeepseekV3.2 models.

Parameters:

  • dtype (DType)
  • kv_params (KVCacheParamInterface)
  • devices (list[DeviceRef])
  • use_subgraphs (bool)
  • data_parallel_degree (int)
  • vocab_size (int)
  • hidden_size (int)
  • intermediate_size (int)
  • moe_intermediate_size (int)
  • moe_layer_freq (int)
  • num_hidden_layers (int)
  • num_attention_heads (int)
  • num_key_value_heads (int)
  • n_shared_experts (int)
  • n_routed_experts (int)
  • routed_scaling_factor (float)
  • kv_lora_rank (int)
  • q_lora_rank (int)
  • qk_rope_head_dim (int)
  • v_head_dim (int)
  • qk_nope_head_dim (int)
  • topk_method (str)
  • n_group (int)
  • topk_group (int)
  • num_experts_per_tok (int)
  • first_k_dense_replace (int)
  • norm_topk_prob (bool)
  • hidden_act (str)
  • max_position_embeddings (int)
  • max_seq_len (int)
  • rms_norm_eps (float)
  • tie_word_embeddings (bool)
  • rope_theta (float)
  • rope_scaling (dict[str, Any] | None)
  • rope_interleave (bool)
  • scoring_func (str)
  • attention_bias (bool)
  • attention_dropout (float)
  • norm_dtype (DType)
  • gate_dtype (DType | None)
  • correction_bias_dtype (DType | None)
  • max_batch_context_length (int)
  • quant_config (QuantConfig | None)
  • dense_mlp_layers_without_quant (frozenset[int])
  • ep_config (EPConfig | None)
  • graph_mode (str)
  • return_logits (ReturnLogits)
  • return_hidden_states (ReturnHiddenStates)
  • eagle_aux_hidden_state_layer_ids (list[int] | None)
  • index_head_dim (int)
  • index_n_heads (int)
  • index_topk (int)

construct_kv_params()​

static construct_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)

source

Parameters:

Return type:

KVCacheParamInterface

index_head_dim​

index_head_dim: int = 128

source

index_n_heads​

index_n_heads: int = 64

source

index_topk​

index_topk: int = 2048

source

initialize()​

classmethod initialize(pipeline_config, model_config=None)

source

Initializes a DeepseekV3_2Config instance from pipeline configuration.

This method creates a config instance with all fields that can be determined from the pipeline configuration, without needing the state_dict. Fields that depend on the state_dict (like norm_dtype, quant_config, etc.) should be set directly after calling this method.

Parameters:

Returns:

An initialized DeepseekV3_2Config instance.

Return type:

Self

DeepseekV3_2Model​

class max.pipelines.architectures.deepseekV3_2.DeepseekV3_2Model(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.ALL, return_hidden_states=ReturnHiddenStates.NONE)

source

Bases: DeepseekV3Model

A DeepseekV3.2 model.

Parameters:

load_model()​

load_model(session)

source

Load the model with the given weights.

Parameters:

session (InferenceSession)

Return type:

Model

model_config_cls​

model_config_cls

source

alias of DeepseekV3_2Config