For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python module
max.pipelines.architectures.deepseekV3_modulev3
DeepSeek-V3 mixture-of-experts architecture for text generation.
DeepseekV3Configβ
class max.pipelines.architectures.deepseekV3_modulev3.DeepseekV3Config(*, dtype, kv_params, devices, vocab_size=129280, hidden_size=7168, intermediate_size=18432, moe_intermediate_size=2048, moe_layer_freq=1, num_hidden_layers=61, num_attention_heads=128, num_key_value_heads=128, n_shared_experts=1, n_routed_experts=256, routed_scaling_factor=2.5, kv_lora_rank=512, q_lora_rank=1536, qk_rope_head_dim=64, v_head_dim=128, qk_nope_head_dim=128, topk_method='noaux_tc', n_group=8, topk_group=4, num_experts_per_tok=8, first_k_dense_replace=3, norm_topk_prob=True, hidden_act='silu', max_position_embeddings=4096, max_seq_len=163840, rms_norm_eps=1e-06, tie_word_embeddings=False, rope_theta=10000.0, rope_scaling=None, rope_interleave=True, scoring_func='sigmoid', attention_bias=False, attention_dropout=0.0, max_batch_context_length=131072, graph_mode='auto')
Bases: ArchConfigWithKVCache
Configuration for DeepseekV3 models (single-GPU, ModuleV3).
-
Parameters:
-
- dtype (DType)
- kv_params (KVCacheParams)
- devices (list[DeviceRef])
- vocab_size (int)
- hidden_size (int)
- intermediate_size (int)
- moe_intermediate_size (int)
- moe_layer_freq (int)
- num_hidden_layers (int)
- num_attention_heads (int)
- num_key_value_heads (int)
- n_shared_experts (int)
- n_routed_experts (int)
- routed_scaling_factor (float)
- kv_lora_rank (int)
- q_lora_rank (int)
- qk_rope_head_dim (int)
- v_head_dim (int)
- qk_nope_head_dim (int)
- topk_method (str)
- n_group (int)
- topk_group (int)
- num_experts_per_tok (int)
- first_k_dense_replace (int)
- norm_topk_prob (bool)
- hidden_act (str)
- max_position_embeddings (int)
- max_seq_len (int)
- rms_norm_eps (float)
- tie_word_embeddings (bool)
- rope_theta (float)
- rope_scaling (dict[str, Any] | None)
- rope_interleave (bool)
- scoring_func (str)
- attention_bias (bool)
- attention_dropout (float)
- max_batch_context_length (int)
- graph_mode (str)
attention_biasβ
attention_bias: bool = False
attention_dropoutβ
attention_dropout: float = 0.0
construct_kv_params()β
static construct_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)
-
Parameters:
-
- huggingface_config (AutoConfig)
- pipeline_config (PipelineConfig)
- devices (list[DeviceRef])
- kv_cache_config (KVCacheConfig)
- cache_dtype (DType)
-
Return type:
devicesβ
dtypeβ
dtype: DType
first_k_dense_replaceβ
first_k_dense_replace: int = 3
get_kv_params()β
get_kv_params()
KV cache parameters to use when running the model.
-
Return type:
get_max_seq_len()β
get_max_seq_len()
Returns the default maximum sequence length for the model.
Subclasses should determine whether this value can be overridden by
setting the --max-length (pipeline_config.model.max_length) flag.
-
Return type:
get_num_layers()β
static get_num_layers(huggingface_config)
-
Parameters:
-
huggingface_config (AutoConfig)
-
Return type:
graph_modeβ
graph_mode: str = 'auto'
hidden_actβ
hidden_act: str = 'silu'
hidden_sizeβ
hidden_size: int = 7168
initialize()β
classmethod initialize(pipeline_config, model_config=None)
Initializes a DeepseekV3Config instance from pipeline configuration.
-
Parameters:
-
- pipeline_config (PipelineConfig)
- model_config (MAXModelConfig | None)
-
Return type:
intermediate_sizeβ
intermediate_size: int = 18432
kv_lora_rankβ
kv_lora_rank: int = 512
kv_paramsβ
kv_params: KVCacheParams
max_batch_context_lengthβ
max_batch_context_length: int = 131072
max_position_embeddingsβ
max_position_embeddings: int = 4096
Maximum positional embeddings as defined by the original model.
max_seq_lenβ
max_seq_len: int = 163840
Maximum sequence length as defined by the MAX Engine pipeline configuration.
moe_intermediate_sizeβ
moe_intermediate_size: int = 2048
moe_layer_freqβ
moe_layer_freq: int = 1
n_groupβ
n_group: int = 8
n_routed_expertsβ
n_routed_experts: int = 256
n_shared_expertsβ
n_shared_experts: int = 1
norm_topk_probβ
norm_topk_prob: bool = True
num_attention_headsβ
num_attention_heads: int = 128
num_experts_per_tokβ
num_experts_per_tok: int = 8
num_hidden_layersβ
num_hidden_layers: int = 61
num_key_value_headsβ
num_key_value_heads: int = 128
q_lora_rankβ
q_lora_rank: int = 1536
qk_nope_head_dimβ
qk_nope_head_dim: int = 128
qk_rope_head_dimβ
qk_rope_head_dim: int = 64
rms_norm_epsβ
rms_norm_eps: float = 1e-06
rope_interleaveβ
rope_interleave: bool = True
rope_scalingβ
rope_thetaβ
rope_theta: float = 10000.0
routed_scaling_factorβ
routed_scaling_factor: float = 2.5
scoring_funcβ
scoring_func: str = 'sigmoid'
tie_word_embeddingsβ
tie_word_embeddings: bool = False
topk_groupβ
topk_group: int = 4
topk_methodβ
topk_method: str = 'noaux_tc'
v_head_dimβ
v_head_dim: int = 128
vocab_sizeβ
vocab_size: int = 129280
DeepseekV3Modelβ
class max.pipelines.architectures.deepseekV3_modulev3.DeepseekV3Model(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.ALL, return_hidden_states=ReturnHiddenStates.NONE)
Bases: DeepseekV2Model
A DeepseekV3 model (ModuleV3, single-GPU).
-
Parameters:
-
- pipeline_config (PipelineConfig)
- session (InferenceSession)
- devices (list[Device])
- kv_cache_config (KVCacheConfig)
- weights (Weights)
- adapter (WeightsAdapter | None)
- return_logits (ReturnLogits)
- return_hidden_states (ReturnHiddenStates)
get_kv_params()β
classmethod get_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)
Returns the KV cache params for the pipeline model.
Delegates to model_config_cls.construct_kv_params(...).
Subclasses with custom KV behavior should override this method.
load_model()β
load_model()
model_config_clsβ
model_config_cls
alias of DeepseekV3Config
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!