For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python module

max.pipelines.architectures.unified_mtp_glm5_2

GLM-5.2 (DeepSeek-V3.2 sparse) MTP draft model for unified speculative decoding.

`UnifiedMTPGlm5_2Inputs`

class max.pipelines.architectures.unified_mtp_glm5_2.UnifiedMTPGlm5_2Inputs(tokens, input_row_offsets, signal_buffers, host_input_row_offsets, batch_context_lengths, *, kv_cache_inputs=None, lora=None, hidden_states=None, return_n_logits, data_parallel_splits, ep_inputs=(), draft_tokens=None, seed=None, temperature=None, top_k=None, max_k=None, top_p=None, min_top_p=None, in_thinking_phase=None, pinned_bitmask=None, wait_payload=None, device_bitmask_scratch=None, structured_output=False)

source

Bases: UnifiedSpecDecodeInputs, DeepseekV3Inputs

Inputs for the UnifiedMTPGlm5_2 model.

Target-prefix fields come from DeepseekV3Inputs; the spec-decode fields and trailing buffer packing come from UnifiedSpecDecodeInputs. The MTP graph binds the per-row in_thinking_phase flag (consumed by relaxed acceptance).

Parameters:

tokens (Buffer)
input_row_offsets (Buffer)
signal_buffers (list[Buffer])
host_input_row_offsets (Buffer)
batch_context_lengths (list[Buffer])
kv_cache_inputs (KVCacheInputsInterface[Buffer, Buffer] | None)
lora (LoRAInputs | None)
hidden_states (Buffer | list[Buffer] | None)
return_n_logits (Buffer)
data_parallel_splits (Buffer)
ep_inputs (tuple[Buffer, ...])
draft_tokens (Buffer | None)
seed (Buffer | None)
temperature (Buffer | None)
top_k (Buffer | None)
max_k (Buffer | None)
top_p (Buffer | None)
min_top_p (Buffer | None)
in_thinking_phase (Buffer | None)
pinned_bitmask (Buffer | None)
wait_payload (Buffer | None)
device_bitmask_scratch (Buffer | None)
structured_output (bool)

`buffers`

property buffers: tuple[Buffer, ...]

source

Returns positional Buffer inputs for model ABI calls.

`UnifiedMTPGlm5_2Model`

class max.pipelines.architectures.unified_mtp_glm5_2.UnifiedMTPGlm5_2Model(*args, **kwargs)

source

Bases: _UnifiedSpecDecodeModelMixin, Glm5_1Model

GLM-5.2 with MTP: merge + V3.2 target + rejection + sparse draft.

`batch_processor_cls`

batch_processor_cls

source

alias of UnifiedMTPGlm5_2BatchProcessor

`load_model()`

load_model(session)

source

Load the model with the given weights.

Parameters:: session (InferenceSession)
Return type:: Model

UnifiedMTPGlm5_2Inputs​

buffers​

UnifiedMTPGlm5_2Model​

batch_processor_cls​

load_model()​

`UnifiedMTPGlm5_2Inputs`

`buffers`

`UnifiedMTPGlm5_2Model`

`batch_processor_cls`

`load_model()`