IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python module

max.pipelines.architectures.glm5_1

GLM-5.1 (GlmMoeDsa) mixture-of-experts architecture for text generation.

Glm5_1Config​

class max.pipelines.architectures.glm5_1.Glm5_1Config(*, dtype, kv_params, devices, use_subgraphs=True, data_parallel_degree=1, vocab_size=129280, hidden_size=7168, intermediate_size=18432, moe_intermediate_size=2048, moe_layer_freq=1, num_hidden_layers=61, num_attention_heads=128, num_key_value_heads=128, n_shared_experts=1, n_routed_experts=256, routed_scaling_factor=2.5, kv_lora_rank=512, q_lora_rank=1536, qk_rope_head_dim=64, v_head_dim=128, qk_nope_head_dim=128, topk_method='greedy', n_group=8, topk_group=4, num_experts_per_tok=8, first_k_dense_replace=3, norm_topk_prob=True, hidden_act='silu', max_position_embeddings=4096, max_seq_len=163840, rms_norm_eps=1e-06, tie_word_embeddings=False, rope_theta=10000.0, rope_scaling=None, rope_interleave=True, scoring_func='sigmoid', attention_bias=False, attention_dropout=0.0, norm_dtype=bfloat16, gate_dtype=None, correction_bias_dtype=None, max_batch_context_length=131072, quant_config=None, dense_mlp_layers_without_quant=frozenset({}), ep_config=None, graph_mode='auto', return_logits=ReturnLogits.LAST_TOKEN, return_hidden_states=ReturnHiddenStates.NONE, eagle_aux_hidden_state_layer_ids=None, index_head_dim=128, index_n_heads=64, index_topk=2048)

source

Bases: DeepseekV3_2Config

Configuration for GLM-5.1 models.

Skeleton alias of DeepseekV3_2Config until GLM-specific bring-up diverges from DeepSeek-V3.2.

Parameters:

  • dtype (DType)
  • kv_params (KVCacheParamInterface)
  • devices (list[DeviceRef])
  • use_subgraphs (bool)
  • data_parallel_degree (int)
  • vocab_size (int)
  • hidden_size (int)
  • intermediate_size (int)
  • moe_intermediate_size (int)
  • moe_layer_freq (int)
  • num_hidden_layers (int)
  • num_attention_heads (int)
  • num_key_value_heads (int)
  • n_shared_experts (int)
  • n_routed_experts (int)
  • routed_scaling_factor (float)
  • kv_lora_rank (int)
  • q_lora_rank (int)
  • qk_rope_head_dim (int)
  • v_head_dim (int)
  • qk_nope_head_dim (int)
  • topk_method (str)
  • n_group (int)
  • topk_group (int)
  • num_experts_per_tok (int)
  • first_k_dense_replace (int)
  • norm_topk_prob (bool)
  • hidden_act (str)
  • max_position_embeddings (int)
  • max_seq_len (int)
  • rms_norm_eps (float)
  • tie_word_embeddings (bool)
  • rope_theta (float)
  • rope_scaling (dict[str, Any] | None)
  • rope_interleave (bool)
  • scoring_func (str)
  • attention_bias (bool)
  • attention_dropout (float)
  • norm_dtype (DType)
  • gate_dtype (DType | None)
  • correction_bias_dtype (DType | None)
  • max_batch_context_length (int)
  • quant_config (QuantConfig | None)
  • dense_mlp_layers_without_quant (frozenset[int])
  • ep_config (EPConfig | None)
  • graph_mode (str)
  • return_logits (ReturnLogits)
  • return_hidden_states (ReturnHiddenStates)
  • eagle_aux_hidden_state_layer_ids (list[int] | None)
  • index_head_dim (int)
  • index_n_heads (int)
  • index_topk (int)

initialize()​

classmethod initialize(pipeline_config, model_config=None)

source

Initialize config, mapping GLM default RoPE to rope_scaling=None.

Parameters:

Return type:

Self

Glm5_1Model​

class max.pipelines.architectures.glm5_1.Glm5_1Model(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.ALL, return_hidden_states=ReturnHiddenStates.NONE)

source

Bases: DeepseekV3_2Model

GLM-5.1 pipeline model.

Skeleton alias of DeepseekV3_2Model until GLM-specific bring-up diverges from DeepSeek-V3.2.

Parameters:

model_config_cls​

model_config_cls

source

alias of Glm5_1Config