For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python module

max.pipelines.architectures.deepseekV3

DeepSeek-V3 mixture-of-experts architecture for text generation.

`DeepseekV3Config`

class max.pipelines.architectures.deepseekV3.DeepseekV3Config(*, dtype, kv_params, devices, use_subgraphs=True, data_parallel_degree=1, vocab_size=129280, hidden_size=7168, intermediate_size=18432, moe_intermediate_size=2048, moe_layer_freq=1, num_hidden_layers=61, num_attention_heads=128, num_key_value_heads=128, n_shared_experts=1, n_routed_experts=256, routed_scaling_factor=2.5, kv_lora_rank=512, q_lora_rank=1536, qk_rope_head_dim=64, v_head_dim=128, qk_nope_head_dim=128, topk_method='greedy', n_group=8, topk_group=4, num_experts_per_tok=8, first_k_dense_replace=3, norm_topk_prob=True, hidden_act='silu', max_position_embeddings=4096, max_seq_len=163840, rms_norm_eps=1e-06, tie_word_embeddings=False, rope_theta=10000.0, rope_scaling=None, rope_interleave=True, scoring_func='sigmoid', attention_bias=False, attention_dropout=0.0, norm_dtype=bfloat16, gate_dtype=None, correction_bias_dtype=None, max_batch_context_length=131072, quant_config=None, dense_mlp_layers_without_quant=frozenset({}), ep_config=None, graph_mode='auto', return_logits=ReturnLogits.LAST_TOKEN, return_hidden_states=ReturnHiddenStates.NONE, eagle_aux_hidden_state_layer_ids=None, eplb_profile_enabled=False)

source

Bases: ArchConfigWithKVCache

Configuration for DeepseekV3 models.

Parameters:

dtype (DType)
kv_params (KVCacheParamInterface)
devices (list[DeviceRef])
use_subgraphs (bool)
data_parallel_degree (int)
vocab_size (int)
hidden_size (int)
intermediate_size (int)
moe_intermediate_size (int)
moe_layer_freq (int)
num_hidden_layers (int)
num_attention_heads (int)
num_key_value_heads (int)
n_shared_experts (int)
n_routed_experts (int)
routed_scaling_factor (float)
kv_lora_rank (int)
q_lora_rank (int)
qk_rope_head_dim (int)
v_head_dim (int)
qk_nope_head_dim (int)
topk_method (str)
n_group (int)
topk_group (int)
num_experts_per_tok (int)
first_k_dense_replace (int)
norm_topk_prob (bool)
hidden_act (str)
max_position_embeddings (int)
max_seq_len (int)
rms_norm_eps (float)
tie_word_embeddings (bool)
rope_theta (float)
rope_scaling (dict[str, Any] | None)
rope_interleave (bool)
scoring_func (str)
attention_bias (bool)
attention_dropout (float)
norm_dtype (DType)
gate_dtype (DType | None)
correction_bias_dtype (DType | None)
max_batch_context_length (int)
quant_config (QuantConfig | None)
dense_mlp_layers_without_quant (frozenset[int])
ep_config (EPConfig | None)
graph_mode (str)
return_logits (ReturnLogits)
return_hidden_states (ReturnHiddenStates)
eagle_aux_hidden_state_layer_ids (list[int] | None)
eplb_profile_enabled (bool)

`attention_bias`

attention_bias: bool = False

source

`attention_dropout`

attention_dropout: float = 0.0

source

`construct_kv_params()`

static construct_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)

source

Parameters:

huggingface_config (AutoConfig)
pipeline_config (PipelineConfig)
devices (list[DeviceRef])
kv_cache_config (KVCacheConfig)
cache_dtype (DType)

Return type:

KVCacheParamInterface

`correction_bias_dtype`

correction_bias_dtype: DType | None = None

source

`data_parallel_degree`

data_parallel_degree: int = 1

source

`dense_mlp_layers_without_quant`

dense_mlp_layers_without_quant: frozenset[int] = frozenset({})

source

Dense prefix layers (indices < first_k_dense_replace) that skip MLP quant.

`devices`

devices: list[DeviceRef]

source

`dtype`

dtype: DType

source

`eagle_aux_hidden_state_layer_ids`

eagle_aux_hidden_state_layer_ids: list[int] | None = None

source

Optional explicit hidden-state capture layer ids for EAGLE3.

`ep_config`

ep_config: EPConfig | None = None

source

`eplb_profile_enabled`

eplb_profile_enabled: bool = False

source

When True, the language graph emits per-layer ep_counter_buffers mutated in-place by scatter_nd_add. Set from pipeline_config.runtime.eplb_profile at config build time.

`first_k_dense_replace`

first_k_dense_replace: int = 3

source

`gate_dtype`

gate_dtype: DType | None = None

source

`get_kv_params()`

get_kv_params()

source

KV cache parameters to use when running the model.

Return type:: KVCacheParamInterface

`get_max_seq_len()`

get_max_seq_len()

source

Returns the default maximum sequence length for the model.

Subclasses should determine whether this value can be overridden by setting the --max-length (pipeline_config.model.max_length) flag.

Return type:: int

`get_num_layers()`

static get_num_layers(huggingface_config)

source

Parameters:: huggingface_config (AutoConfig)
Return type:: int

`graph_mode`

graph_mode: str = 'auto'

source

`hidden_act`

hidden_act: str = 'silu'

source

`hidden_size`

hidden_size: int = 7168

source

`initialize()`

classmethod initialize(pipeline_config, model_config=None)

source

Initializes a DeepseekV3Config instance from pipeline configuration.

This method creates a config instance with all fields that can be determined from the pipeline configuration, without needing the state_dict. Fields that depend on the state_dict (like norm_dtype, quant_config, etc.) should be set via the finalize() method.

Parameters:

pipeline_config (PipelineConfig) – The MAX Engine pipeline configuration.
model_config (MAXModelConfig | None)

Returns:

An initialized DeepseekV3Config instance.

Return type:

Self

`intermediate_size`

intermediate_size: int = 18432

source

`kv_lora_rank`

kv_lora_rank: int = 512

source

`kv_params`

kv_params: KVCacheParamInterface

source

`max_batch_context_length`

max_batch_context_length: int = 131072

source

`max_position_embeddings`

max_position_embeddings: int = 4096

source

Maximum positional embeddings as defined by the original model.

`max_seq_len`

max_seq_len: int = 163840

source

Maximum sequence length as defined by the MAX Engine pipeline configuration.

`moe_intermediate_size`

moe_intermediate_size: int = 2048

source

`moe_layer_freq`

moe_layer_freq: int = 1

source

`n_group`

n_group: int = 8

source

`n_routed_experts`

n_routed_experts: int = 256

source

`n_shared_experts`

n_shared_experts: int = 1

source

`norm_dtype`

norm_dtype: DType = 80

source

`norm_topk_prob`

norm_topk_prob: bool = True

source

`num_attention_heads`

num_attention_heads: int = 128

source

`num_experts_per_tok`

num_experts_per_tok: int = 8

source

`num_hidden_layers`

num_hidden_layers: int = 61

source

`num_key_value_heads`

num_key_value_heads: int = 128

source

`q_lora_rank`

q_lora_rank: int = 1536

source

`qk_nope_head_dim`

qk_nope_head_dim: int = 128

source

`qk_rope_head_dim`

qk_rope_head_dim: int = 64

source

`quant_config`

quant_config: QuantConfig | None = None

source

`return_hidden_states`

return_hidden_states: ReturnHiddenStates = 'none'

source

Whether to return hidden states and which type (none, last, all, last_normalized, all_normalized).

`return_logits`

return_logits: ReturnLogits = 'last_token'

source

Whether to return the last token, all logits, or a variable number of logits.

`rms_norm_eps`

rms_norm_eps: float = 1e-06

source

`rope_interleave`

rope_interleave: bool = True

source

`rope_scaling`

rope_scaling: dict[str, Any] | None = None

source

`rope_theta`

rope_theta: float = 10000.0

source

`routed_scaling_factor`

routed_scaling_factor: float = 2.5

source

`scoring_func`

scoring_func: str = 'sigmoid'

source

`tie_word_embeddings`

tie_word_embeddings: bool = False

source

`topk_group`

topk_group: int = 4

source

`topk_method`

topk_method: str = 'greedy'

source

`use_subgraphs`

use_subgraphs: bool = True

source

`v_head_dim`

v_head_dim: int = 128

source

`vocab_size`

vocab_size: int = 129280

source

`DeepseekV3Inputs`

class max.pipelines.architectures.deepseekV3.DeepseekV3Inputs(tokens, input_row_offsets, signal_buffers, host_input_row_offsets, batch_context_lengths, *, kv_cache_inputs=None, lora=None, hidden_states=None, return_n_logits, data_parallel_splits, ep_inputs=())

source

Bases: DeepseekV2Inputs

A class representing inputs for the DeepseekV3 model.

Parameters:

tokens (Buffer)
input_row_offsets (Buffer)
signal_buffers (list[Buffer])
host_input_row_offsets (Buffer)
batch_context_lengths (list[Buffer])
kv_cache_inputs (KVCacheInputsInterface[Buffer, Buffer] | None)
lora (LoRAInputs | None)
hidden_states (Buffer | list[Buffer] | None)
return_n_logits (Buffer)
data_parallel_splits (Buffer)
ep_inputs (tuple[Buffer, ...])

`batch_context_lengths`

batch_context_lengths: list[Buffer]

source

List of tensors containing the context length of each batch.

`buffers`

property buffers: tuple[Buffer, ...]

source

Returns positional Buffer inputs for model ABI calls.

`data_parallel_splits`

data_parallel_splits: Buffer

source

Tensor containing the data parallel splits for the MLA layer.

`ep_inputs`

ep_inputs: tuple[Buffer, ...] = ()

source

Expert parallel communication buffers (atomic counters and device pointers).

`host_input_row_offsets`

host_input_row_offsets: Buffer

source

Tensor containing the host input row offsets.

`DeepseekV3Model`

class max.pipelines.architectures.deepseekV3.DeepseekV3Model(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.ALL, return_hidden_states=ReturnHiddenStates.NONE, max_batch_size=1)

source

Bases: AlwaysSignalBuffersMixin, DeepseekV2Model

A DeepseekV3 model.

Parameters:

pipeline_config (PipelineConfig)
session (InferenceSession)
devices (list[Device])
kv_cache_config (KVCacheConfig)
weights (Weights)
adapter (WeightsAdapter | None)
return_logits (ReturnLogits)
return_hidden_states (ReturnHiddenStates)
max_batch_size (int)

`batch_processor_cls`

batch_processor_cls

source

alias of DeepseekV3BatchProcessor

`execute()`

execute(model_inputs)

source

Executes the graph with the given inputs.

Parameters:: model_inputs (ModelInputs) – The model inputs to execute, containing tensors and any other required data for model execution.
Returns:: ModelOutputs containing the pipeline’s output tensors.
Return type:: ModelOutputs

This is an abstract method that must be implemented by concrete PipelineModels to define their specific execution logic.

`model_config_cls`

model_config_cls

source

alias of DeepseekV3Config

`DeepseekV3ToolParser`

class max.pipelines.architectures.deepseekV3.DeepseekV3ToolParser

source

Bases: StructuralTagToolParser

Parses DeepSeek V3 (original) tool calls with markdown-wrapped JSON.

The body format is:

function<｜tool▁sep｜>{name}
```json
{args}

The literal `function` type prefix before `<｜tool▁sep｜>` is
ignored. The name follows the separator on its own line, and the
arguments are raw JSON between the markdown fences.

### `CALL_BEGIN` \{#max.pipelines.architectures.deepseekV3.DeepseekV3ToolParser.CALL_BEGIN}

<blockquote>

CALL_BEGIN: ClassVar[[str](https://docs.python.org/library/stdtypes.html#str)] = '&lt;｜tool▁call▁begin｜&gt;'

<a href="https://github.com/modular/modular/blob/main/max/python/max/pipelines/architectures/deepseekV3/tool_parser.py#L169-L169" class="viewcode-link">source</a>
</blockquote>

### `CALL_END` \{#max.pipelines.architectures.deepseekV3.DeepseekV3ToolParser.CALL_END}

<blockquote>

CALL_END: ClassVar[[str](https://docs.python.org/library/stdtypes.html#str)] = '&lt;｜tool▁call▁end｜&gt;'

<a href="https://github.com/modular/modular/blob/main/max/python/max/pipelines/architectures/deepseekV3/tool_parser.py#L170-L170" class="viewcode-link">source</a>
</blockquote>

### `SECTION_BEGIN` \{#max.pipelines.architectures.deepseekV3.DeepseekV3ToolParser.SECTION_BEGIN}

<blockquote>

SECTION_BEGIN: ClassVar[[str](https://docs.python.org/library/stdtypes.html#str)] = '&lt;｜tool▁calls▁begin｜&gt;'

<a href="https://github.com/modular/modular/blob/main/max/python/max/pipelines/architectures/deepseekV3/tool_parser.py#L167-L167" class="viewcode-link">source</a>
</blockquote>

### `SECTION_END` \{#max.pipelines.architectures.deepseekV3.DeepseekV3ToolParser.SECTION_END}

<blockquote>

SECTION_END: ClassVar[[str](https://docs.python.org/library/stdtypes.html#str)] = '&lt;｜tool▁calls▁end｜&gt;'

<a href="https://github.com/modular/modular/blob/main/max/python/max/pipelines/architectures/deepseekV3/tool_parser.py#L168-L168" class="viewcode-link">source</a>
</blockquote>

## `DeepseekV3_1ToolParser` \{#max.pipelines.architectures.deepseekV3.DeepseekV3_1ToolParser}

<blockquote>

class max.pipelines.architectures.deepseekV3.DeepseekV3_1ToolParser

<a href="https://github.com/modular/modular/blob/main/max/python/max/pipelines/architectures/deepseekV3/tool_parser.py#L108-L148" class="viewcode-link">source</a>
</blockquote>

Bases: `StructuralTagToolParser`

Parses DeepSeek V3.1+ tool calls (raw JSON arguments).

The function name is a plain identifier immediately after
`<｜tool▁call▁begin｜>` and the arguments are raw JSON between
`<｜tool▁sep｜>` and `<｜tool▁call▁end｜>`.

### `CALL_BEGIN` \{#max.pipelines.architectures.deepseekV3.DeepseekV3_1ToolParser.CALL_BEGIN}

<blockquote>

CALL_BEGIN: ClassVar[[str](https://docs.python.org/library/stdtypes.html#str)] = '&lt;｜tool▁call▁begin｜&gt;'

<a href="https://github.com/modular/modular/blob/main/max/python/max/pipelines/architectures/deepseekV3/tool_parser.py#L119-L119" class="viewcode-link">source</a>
</blockquote>

### `CALL_END` \{#max.pipelines.architectures.deepseekV3.DeepseekV3_1ToolParser.CALL_END}

<blockquote>

CALL_END: ClassVar[[str](https://docs.python.org/library/stdtypes.html#str)] = '&lt;｜tool▁call▁end｜&gt;'

<a href="https://github.com/modular/modular/blob/main/max/python/max/pipelines/architectures/deepseekV3/tool_parser.py#L120-L120" class="viewcode-link">source</a>
</blockquote>

### `SECTION_BEGIN` \{#max.pipelines.architectures.deepseekV3.DeepseekV3_1ToolParser.SECTION_BEGIN}

<blockquote>

SECTION_BEGIN: ClassVar[[str](https://docs.python.org/library/stdtypes.html#str)] = '&lt;｜tool▁calls▁begin｜&gt;'

<a href="https://github.com/modular/modular/blob/main/max/python/max/pipelines/architectures/deepseekV3/tool_parser.py#L117-L117" class="viewcode-link">source</a>
</blockquote>

### `SECTION_END` \{#max.pipelines.architectures.deepseekV3.DeepseekV3_1ToolParser.SECTION_END}

<blockquote>

SECTION_END: ClassVar[[str](https://docs.python.org/library/stdtypes.html#str)] = '&lt;｜tool▁calls▁end｜&gt;'

<a href="https://github.com/modular/modular/blob/main/max/python/max/pipelines/architectures/deepseekV3/tool_parser.py#L118-L118" class="viewcode-link">source</a>
</blockquote>

DeepseekV3Config
DeepseekV3Inputs
DeepseekV3Model
DeepseekV3ToolParser

DeepseekV3Config​

attention_bias​

attention_dropout​

construct_kv_params()​

correction_bias_dtype​

data_parallel_degree​

dense_mlp_layers_without_quant​

devices​

dtype​

eagle_aux_hidden_state_layer_ids​

ep_config​

eplb_profile_enabled​

first_k_dense_replace​

gate_dtype​

get_kv_params()​

get_max_seq_len()​

get_num_layers()​

graph_mode​

hidden_act​

hidden_size​

initialize()​

intermediate_size​

kv_lora_rank​

kv_params​

max_batch_context_length​

max_position_embeddings​

max_seq_len​

moe_intermediate_size​

moe_layer_freq​

n_group​

n_routed_experts​

n_shared_experts​

norm_dtype​

norm_topk_prob​

num_attention_heads​

num_experts_per_tok​

num_hidden_layers​

num_key_value_heads​

q_lora_rank​

qk_nope_head_dim​

qk_rope_head_dim​

quant_config​

return_hidden_states​

return_logits​

rms_norm_eps​

rope_interleave​

rope_scaling​

rope_theta​

routed_scaling_factor​

scoring_func​

tie_word_embeddings​

topk_group​

topk_method​

use_subgraphs​

v_head_dim​

vocab_size​

DeepseekV3Inputs​

batch_context_lengths​

buffers​

data_parallel_splits​

ep_inputs​

host_input_row_offsets​

DeepseekV3Model​

batch_processor_cls​

execute()​

model_config_cls​

DeepseekV3ToolParser​

`DeepseekV3Config`

`attention_bias`

`attention_dropout`

`construct_kv_params()`

`correction_bias_dtype`

`data_parallel_degree`

`dense_mlp_layers_without_quant`

`devices`

`dtype`

`eagle_aux_hidden_state_layer_ids`

`ep_config`

`eplb_profile_enabled`

`first_k_dense_replace`

`gate_dtype`

`get_kv_params()`

`get_max_seq_len()`

`get_num_layers()`

`graph_mode`

`hidden_act`

`hidden_size`

`initialize()`

`intermediate_size`

`kv_lora_rank`

`kv_params`

`max_batch_context_length`

`max_position_embeddings`

`max_seq_len`

`moe_intermediate_size`

`moe_layer_freq`

`n_group`

`n_routed_experts`

`n_shared_experts`

`norm_dtype`

`norm_topk_prob`

`num_attention_heads`

`num_experts_per_tok`

`num_hidden_layers`

`num_key_value_heads`

`q_lora_rank`

`qk_nope_head_dim`

`qk_rope_head_dim`

`quant_config`

`return_hidden_states`

`return_logits`

`rms_norm_eps`

`rope_interleave`

`rope_scaling`

`rope_theta`

`routed_scaling_factor`

`scoring_func`

`tie_word_embeddings`

`topk_group`

`topk_method`

`use_subgraphs`

`v_head_dim`

`vocab_size`

`DeepseekV3Inputs`

`batch_context_lengths`

`buffers`

`data_parallel_splits`

`ep_inputs`

`host_input_row_offsets`

`DeepseekV3Model`

`batch_processor_cls`

`execute()`

`model_config_cls`

`DeepseekV3ToolParser`