For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python module
max.pipelines.architectures.glm5_1
GLM-5.1 (GlmMoeDsa) mixture-of-experts architecture for text generation.
Glm5_1Configβ
class max.pipelines.architectures.glm5_1.Glm5_1Config(*, dtype, kv_params, devices, use_subgraphs=True, data_parallel_degree=1, vocab_size=129280, hidden_size=7168, intermediate_size=18432, moe_intermediate_size=2048, moe_layer_freq=1, num_hidden_layers=61, num_attention_heads=128, num_key_value_heads=128, n_shared_experts=1, n_routed_experts=256, routed_scaling_factor=2.5, kv_lora_rank=512, q_lora_rank=1536, qk_rope_head_dim=64, v_head_dim=128, qk_nope_head_dim=128, topk_method='greedy', n_group=8, topk_group=4, num_experts_per_tok=8, first_k_dense_replace=3, norm_topk_prob=True, hidden_act='silu', max_position_embeddings=4096, max_seq_len=163840, rms_norm_eps=1e-06, tie_word_embeddings=False, rope_theta=10000.0, rope_scaling=None, rope_interleave=True, scoring_func='sigmoid', attention_bias=False, attention_dropout=0.0, norm_dtype=bfloat16, gate_dtype=None, correction_bias_dtype=None, max_batch_context_length=131072, quant_config=None, dense_mlp_layers_without_quant=frozenset({}), ep_config=None, graph_mode='auto', return_logits=ReturnLogits.LAST_TOKEN, return_hidden_states=ReturnHiddenStates.NONE, eagle_aux_hidden_state_layer_ids=None, index_head_dim=128, index_n_heads=64, index_topk=2048)
Bases: DeepseekV3_2Config
Configuration for GLM-5.1 models.
Skeleton alias of DeepseekV3_2Config
until GLM-specific bring-up diverges from DeepSeek-V3.2.
-
Parameters:
-
- dtype (DType)
- kv_params (KVCacheParamInterface)
- devices (list[DeviceRef])
- use_subgraphs (bool)
- data_parallel_degree (int)
- vocab_size (int)
- hidden_size (int)
- intermediate_size (int)
- moe_intermediate_size (int)
- moe_layer_freq (int)
- num_hidden_layers (int)
- num_attention_heads (int)
- num_key_value_heads (int)
- n_shared_experts (int)
- n_routed_experts (int)
- routed_scaling_factor (float)
- kv_lora_rank (int)
- q_lora_rank (int)
- qk_rope_head_dim (int)
- v_head_dim (int)
- qk_nope_head_dim (int)
- topk_method (str)
- n_group (int)
- topk_group (int)
- num_experts_per_tok (int)
- first_k_dense_replace (int)
- norm_topk_prob (bool)
- hidden_act (str)
- max_position_embeddings (int)
- max_seq_len (int)
- rms_norm_eps (float)
- tie_word_embeddings (bool)
- rope_theta (float)
- rope_scaling (dict[str, Any] | None)
- rope_interleave (bool)
- scoring_func (str)
- attention_bias (bool)
- attention_dropout (float)
- norm_dtype (DType)
- gate_dtype (DType | None)
- correction_bias_dtype (DType | None)
- max_batch_context_length (int)
- quant_config (QuantConfig | None)
- dense_mlp_layers_without_quant (frozenset[int])
- ep_config (EPConfig | None)
- graph_mode (str)
- return_logits (ReturnLogits)
- return_hidden_states (ReturnHiddenStates)
- eagle_aux_hidden_state_layer_ids (list[int] | None)
- index_head_dim (int)
- index_n_heads (int)
- index_topk (int)
initialize()β
classmethod initialize(pipeline_config, model_config=None)
Initialize config, mapping GLM default RoPE to rope_scaling=None.
-
Parameters:
-
- pipeline_config (PipelineConfig)
- model_config (MAXModelConfig | None)
-
Return type:
Glm5_1Modelβ
class max.pipelines.architectures.glm5_1.Glm5_1Model(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.ALL, return_hidden_states=ReturnHiddenStates.NONE)
Bases: DeepseekV3_2Model
GLM-5.1 pipeline model.
Skeleton alias of DeepseekV3_2Model
until GLM-specific bring-up diverges from DeepSeek-V3.2.
-
Parameters:
-
- pipeline_config (PipelineConfig)
- session (InferenceSession)
- devices (list[Device])
- kv_cache_config (KVCacheConfig)
- weights (Weights)
- adapter (WeightsAdapter | None)
- return_logits (ReturnLogits)
- return_hidden_states (ReturnHiddenStates)
model_config_clsβ
model_config_cls
alias of Glm5_1Config
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!