IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python module

max.pipelines.architectures.deepseekV3_modulev3

DeepSeek-V3 mixture-of-experts architecture for text generation.

DeepseekV3Config​

class max.pipelines.architectures.deepseekV3_modulev3.DeepseekV3Config(*, dtype, kv_params, devices, vocab_size=129280, hidden_size=7168, intermediate_size=18432, moe_intermediate_size=2048, moe_layer_freq=1, num_hidden_layers=61, num_attention_heads=128, num_key_value_heads=128, n_shared_experts=1, n_routed_experts=256, routed_scaling_factor=2.5, kv_lora_rank=512, q_lora_rank=1536, qk_rope_head_dim=64, v_head_dim=128, qk_nope_head_dim=128, topk_method='noaux_tc', n_group=8, topk_group=4, num_experts_per_tok=8, first_k_dense_replace=3, norm_topk_prob=True, hidden_act='silu', max_position_embeddings=4096, max_seq_len=163840, rms_norm_eps=1e-06, tie_word_embeddings=False, rope_theta=10000.0, rope_scaling=None, rope_interleave=True, scoring_func='sigmoid', attention_bias=False, attention_dropout=0.0, max_batch_context_length=131072, graph_mode='auto')

source

Bases: ArchConfigWithKVCache

Configuration for DeepseekV3 models (single-GPU, ModuleV3).

Parameters:

  • dtype (DType)
  • kv_params (KVCacheParams)
  • devices (list[DeviceRef])
  • vocab_size (int)
  • hidden_size (int)
  • intermediate_size (int)
  • moe_intermediate_size (int)
  • moe_layer_freq (int)
  • num_hidden_layers (int)
  • num_attention_heads (int)
  • num_key_value_heads (int)
  • n_shared_experts (int)
  • n_routed_experts (int)
  • routed_scaling_factor (float)
  • kv_lora_rank (int)
  • q_lora_rank (int)
  • qk_rope_head_dim (int)
  • v_head_dim (int)
  • qk_nope_head_dim (int)
  • topk_method (str)
  • n_group (int)
  • topk_group (int)
  • num_experts_per_tok (int)
  • first_k_dense_replace (int)
  • norm_topk_prob (bool)
  • hidden_act (str)
  • max_position_embeddings (int)
  • max_seq_len (int)
  • rms_norm_eps (float)
  • tie_word_embeddings (bool)
  • rope_theta (float)
  • rope_scaling (dict[str, Any] | None)
  • rope_interleave (bool)
  • scoring_func (str)
  • attention_bias (bool)
  • attention_dropout (float)
  • max_batch_context_length (int)
  • graph_mode (str)

attention_bias​

attention_bias: bool = False

source

attention_dropout​

attention_dropout: float = 0.0

source

construct_kv_params()​

static construct_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)

source

Parameters:

Return type:

KVCacheParams

devices​

devices: list[DeviceRef]

source

dtype​

dtype: DType

source

first_k_dense_replace​

first_k_dense_replace: int = 3

source

get_kv_params()​

get_kv_params()

source

KV cache parameters to use when running the model.

Return type:

KVCacheParams

get_max_seq_len()​

get_max_seq_len()

source

Returns the default maximum sequence length for the model.

Subclasses should determine whether this value can be overridden by setting the --max-length (pipeline_config.model.max_length) flag.

Return type:

int

get_num_layers()​

static get_num_layers(huggingface_config)

source

Parameters:

huggingface_config (AutoConfig)

Return type:

int

graph_mode​

graph_mode: str = 'auto'

source

hidden_act​

hidden_act: str = 'silu'

source

hidden_size​

hidden_size: int = 7168

source

initialize()​

classmethod initialize(pipeline_config, model_config=None)

source

Initializes a DeepseekV3Config instance from pipeline configuration.

Parameters:

Return type:

Self

intermediate_size​

intermediate_size: int = 18432

source

kv_lora_rank​

kv_lora_rank: int = 512

source

kv_params​

kv_params: KVCacheParams

source

max_batch_context_length​

max_batch_context_length: int = 131072

source

max_position_embeddings​

max_position_embeddings: int = 4096

source

Maximum positional embeddings as defined by the original model.

max_seq_len​

max_seq_len: int = 163840

source

Maximum sequence length as defined by the MAX Engine pipeline configuration.

moe_intermediate_size​

moe_intermediate_size: int = 2048

source

moe_layer_freq​

moe_layer_freq: int = 1

source

n_group​

n_group: int = 8

source

n_routed_experts​

n_routed_experts: int = 256

source

n_shared_experts​

n_shared_experts: int = 1

source

norm_topk_prob​

norm_topk_prob: bool = True

source

num_attention_heads​

num_attention_heads: int = 128

source

num_experts_per_tok​

num_experts_per_tok: int = 8

source

num_hidden_layers​

num_hidden_layers: int = 61

source

num_key_value_heads​

num_key_value_heads: int = 128

source

q_lora_rank​

q_lora_rank: int = 1536

source

qk_nope_head_dim​

qk_nope_head_dim: int = 128

source

qk_rope_head_dim​

qk_rope_head_dim: int = 64

source

rms_norm_eps​

rms_norm_eps: float = 1e-06

source

rope_interleave​

rope_interleave: bool = True

source

rope_scaling​

rope_scaling: dict[str, Any] | None = None

source

rope_theta​

rope_theta: float = 10000.0

source

routed_scaling_factor​

routed_scaling_factor: float = 2.5

source

scoring_func​

scoring_func: str = 'sigmoid'

source

tie_word_embeddings​

tie_word_embeddings: bool = False

source

topk_group​

topk_group: int = 4

source

topk_method​

topk_method: str = 'noaux_tc'

source

v_head_dim​

v_head_dim: int = 128

source

vocab_size​

vocab_size: int = 129280

source

DeepseekV3Model​

class max.pipelines.architectures.deepseekV3_modulev3.DeepseekV3Model(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.ALL, return_hidden_states=ReturnHiddenStates.NONE)

source

Bases: DeepseekV2Model

A DeepseekV3 model (ModuleV3, single-GPU).

Parameters:

get_kv_params()​

classmethod get_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)

source

Returns the KV cache params for the pipeline model.

Delegates to model_config_cls.construct_kv_params(...). Subclasses with custom KV behavior should override this method.

Parameters:

  • huggingface_config (AutoConfig)
  • pipeline_config (Any)
  • devices (list[DeviceRef])
  • kv_cache_config (Any)
  • cache_dtype (DType)

Return type:

KVCacheParamInterface

load_model()​

load_model()

source

Return type:

Callable[[…], Any]

model_config_cls​

model_config_cls

source

alias of DeepseekV3Config