IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python module

max.pipelines.architectures.unified_mtp_glm5_2

GLM-5.2 (DeepSeek-V3.2 sparse) MTP draft model for unified speculative decoding.

UnifiedMTPGlm5_2Inputs​

class max.pipelines.architectures.unified_mtp_glm5_2.UnifiedMTPGlm5_2Inputs(tokens, input_row_offsets, signal_buffers, host_input_row_offsets, batch_context_lengths, *, kv_cache_inputs=None, lora=None, hidden_states=None, return_n_logits, data_parallel_splits, ep_inputs=(), draft_tokens=None, seed=None, temperature=None, top_k=None, max_k=None, top_p=None, min_top_p=None, in_thinking_phase=None, pinned_bitmask=None, wait_payload=None, device_bitmask_scratch=None, structured_output=False)

source

Bases: UnifiedSpecDecodeInputs, DeepseekV3Inputs

Inputs for the UnifiedMTPGlm5_2 model.

Target-prefix fields come from DeepseekV3Inputs; the spec-decode fields and trailing buffer packing come from UnifiedSpecDecodeInputs. The MTP graph binds the per-row in_thinking_phase flag (consumed by relaxed acceptance).

Parameters:

buffers​

property buffers: tuple[Buffer, ...]

source

Returns positional Buffer inputs for model ABI calls.

UnifiedMTPGlm5_2Model​

class max.pipelines.architectures.unified_mtp_glm5_2.UnifiedMTPGlm5_2Model(*args, **kwargs)

source

Bases: _UnifiedSpecDecodeModelMixin, Glm5_1Model

GLM-5.2 with MTP: merge + V3.2 target + rejection + sparse draft.

batch_processor_cls​

batch_processor_cls

source

alias of UnifiedMTPGlm5_2BatchProcessor

load_model()​

load_model(session)

source

Load the model with the given weights.

Parameters:

session (InferenceSession)

Return type:

Model