For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python module

max.pipelines.architectures.deepseekV3_modulev3

DeepSeek-V3 mixture-of-experts architecture for text generation.

`DeepseekV3Config`

class max.pipelines.architectures.deepseekV3_modulev3.DeepseekV3Config(*, dtype, kv_params, devices, vocab_size=129280, hidden_size=7168, intermediate_size=18432, moe_intermediate_size=2048, moe_layer_freq=1, num_hidden_layers=61, num_attention_heads=128, num_key_value_heads=128, n_shared_experts=1, n_routed_experts=256, routed_scaling_factor=2.5, kv_lora_rank=512, q_lora_rank=1536, qk_rope_head_dim=64, v_head_dim=128, qk_nope_head_dim=128, topk_method='noaux_tc', n_group=8, topk_group=4, num_experts_per_tok=8, first_k_dense_replace=3, norm_topk_prob=True, hidden_act='silu', max_position_embeddings=4096, max_seq_len=163840, rms_norm_eps=1e-06, tie_word_embeddings=False, rope_theta=10000.0, rope_scaling=None, rope_interleave=True, scoring_func='sigmoid', attention_bias=False, attention_dropout=0.0, max_batch_context_length=131072, graph_mode='auto')

source

Bases: ArchConfigWithKVCache

Configuration for DeepseekV3 models (single-GPU, ModuleV3).

Parameters:

dtype (DType)
kv_params (KVCacheParams)
devices (list[DeviceRef])
vocab_size (int)
hidden_size (int)
intermediate_size (int)
moe_intermediate_size (int)
moe_layer_freq (int)
num_hidden_layers (int)
num_attention_heads (int)
num_key_value_heads (int)
n_shared_experts (int)
n_routed_experts (int)
routed_scaling_factor (float)
kv_lora_rank (int)
q_lora_rank (int)
qk_rope_head_dim (int)
v_head_dim (int)
qk_nope_head_dim (int)
topk_method (str)
n_group (int)
topk_group (int)
num_experts_per_tok (int)
first_k_dense_replace (int)
norm_topk_prob (bool)
hidden_act (str)
max_position_embeddings (int)
max_seq_len (int)
rms_norm_eps (float)
tie_word_embeddings (bool)
rope_theta (float)
rope_scaling (dict[str, Any] | None)
rope_interleave (bool)
scoring_func (str)
attention_bias (bool)
attention_dropout (float)
max_batch_context_length (int)
graph_mode (str)

`attention_bias`

attention_bias: bool = False

source

`attention_dropout`

attention_dropout: float = 0.0

source

`construct_kv_params()`

static construct_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)

source

Parameters:

huggingface_config (AutoConfig)
pipeline_config (PipelineConfig)
devices (list[DeviceRef])
kv_cache_config (KVCacheConfig)
cache_dtype (DType)

Return type:

KVCacheParams

`devices`

devices: list[DeviceRef]

source

`dtype`

dtype: DType

source

`first_k_dense_replace`

first_k_dense_replace: int = 3

source

`get_kv_params()`

get_kv_params()

source

KV cache parameters to use when running the model.

Return type:: KVCacheParams

`get_max_seq_len()`

get_max_seq_len()

source

Returns the default maximum sequence length for the model.

Subclasses should determine whether this value can be overridden by setting the --max-length (pipeline_config.model.max_length) flag.

Return type:: int

`get_num_layers()`

static get_num_layers(huggingface_config)

source

Parameters:: huggingface_config (AutoConfig)
Return type:: int

`graph_mode`

graph_mode: str = 'auto'

source

`hidden_act`

hidden_act: str = 'silu'

source

`hidden_size`

hidden_size: int = 7168

source

`initialize()`

classmethod initialize(pipeline_config, model_config=None)

source

Initializes a DeepseekV3Config instance from pipeline configuration.

Parameters:

pipeline_config (PipelineConfig)
model_config (MAXModelConfig | None)

Return type:

Self

`intermediate_size`

intermediate_size: int = 18432

source

`kv_lora_rank`

kv_lora_rank: int = 512

source

`kv_params`

kv_params: KVCacheParams

source

`max_batch_context_length`

max_batch_context_length: int = 131072

source

`max_position_embeddings`

max_position_embeddings: int = 4096

source

Maximum positional embeddings as defined by the original model.

`max_seq_len`

max_seq_len: int = 163840

source

Maximum sequence length as defined by the MAX Engine pipeline configuration.

`moe_intermediate_size`

moe_intermediate_size: int = 2048

source

`moe_layer_freq`

moe_layer_freq: int = 1

source

`n_group`

n_group: int = 8

source

`n_routed_experts`

n_routed_experts: int = 256

source

`n_shared_experts`

n_shared_experts: int = 1

source

`norm_topk_prob`

norm_topk_prob: bool = True

source

`num_attention_heads`

num_attention_heads: int = 128

source

`num_experts_per_tok`

num_experts_per_tok: int = 8

source

`num_hidden_layers`

num_hidden_layers: int = 61

source

`num_key_value_heads`

num_key_value_heads: int = 128

source

`q_lora_rank`

q_lora_rank: int = 1536

source

`qk_nope_head_dim`

qk_nope_head_dim: int = 128

source

`qk_rope_head_dim`

qk_rope_head_dim: int = 64

source

`rms_norm_eps`

rms_norm_eps: float = 1e-06

source

`rope_interleave`

rope_interleave: bool = True

source

`rope_scaling`

rope_scaling: dict[str, Any] | None = None

source

`rope_theta`

rope_theta: float = 10000.0

source

`routed_scaling_factor`

routed_scaling_factor: float = 2.5

source

`scoring_func`

scoring_func: str = 'sigmoid'

source

`tie_word_embeddings`

tie_word_embeddings: bool = False

source

`topk_group`

topk_group: int = 4

source

`topk_method`

topk_method: str = 'noaux_tc'

source

`v_head_dim`

v_head_dim: int = 128

source

`vocab_size`

vocab_size: int = 129280

source

`DeepseekV3Model`

class max.pipelines.architectures.deepseekV3_modulev3.DeepseekV3Model(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.ALL, return_hidden_states=ReturnHiddenStates.NONE)

source

Bases: DeepseekV2Model

A DeepseekV3 model (ModuleV3, single-GPU).

Parameters:

pipeline_config (PipelineConfig)
session (InferenceSession)
devices (list[Device])
kv_cache_config (KVCacheConfig)
weights (Weights)
adapter (WeightsAdapter | None)
return_logits (ReturnLogits)
return_hidden_states (ReturnHiddenStates)

`get_kv_params()`

classmethod get_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)

source

Returns the KV cache params for the pipeline model.

Delegates to model_config_cls.construct_kv_params(...). Subclasses with custom KV behavior should override this method.

Parameters:

huggingface_config (AutoConfig)
pipeline_config (Any)
devices (list[DeviceRef])
kv_cache_config (Any)
cache_dtype (DType)

Return type:

KVCacheParamInterface

`load_model()`

load_model()

source

Return type:: Callable[[…], Any]

`model_config_cls`

model_config_cls

source

alias of DeepseekV3Config

DeepseekV3Config​

attention_bias​

attention_dropout​

construct_kv_params()​

devices​

dtype​

first_k_dense_replace​

get_kv_params()​

get_max_seq_len()​

get_num_layers()​

graph_mode​

hidden_act​

hidden_size​

initialize()​

intermediate_size​

kv_lora_rank​

kv_params​

max_batch_context_length​

max_position_embeddings​

max_seq_len​

moe_intermediate_size​

moe_layer_freq​

n_group​

n_routed_experts​

n_shared_experts​

norm_topk_prob​

num_attention_heads​

num_experts_per_tok​

num_hidden_layers​

num_key_value_heads​

q_lora_rank​

qk_nope_head_dim​

qk_rope_head_dim​

rms_norm_eps​

rope_interleave​

rope_scaling​

rope_theta​

routed_scaling_factor​

scoring_func​

tie_word_embeddings​

topk_group​

topk_method​

v_head_dim​

vocab_size​

DeepseekV3Model​

get_kv_params()​

load_model()​

model_config_cls​

`DeepseekV3Config`

`attention_bias`

`attention_dropout`

`construct_kv_params()`

`devices`

`dtype`

`first_k_dense_replace`

`get_kv_params()`

`get_max_seq_len()`

`get_num_layers()`

`graph_mode`

`hidden_act`

`hidden_size`

`initialize()`

`intermediate_size`

`kv_lora_rank`

`kv_params`

`max_batch_context_length`

`max_position_embeddings`

`max_seq_len`

`moe_intermediate_size`

`moe_layer_freq`

`n_group`

`n_routed_experts`

`n_shared_experts`

`norm_topk_prob`

`num_attention_heads`

`num_experts_per_tok`

`num_hidden_layers`

`num_key_value_heads`

`q_lora_rank`

`qk_nope_head_dim`

`qk_rope_head_dim`

`rms_norm_eps`

`rope_interleave`

`rope_scaling`

`rope_theta`

`routed_scaling_factor`

`scoring_func`

`tie_word_embeddings`

`topk_group`

`topk_method`

`v_head_dim`

`vocab_size`

`DeepseekV3Model`

`get_kv_params()`

`load_model()`

`model_config_cls`