For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python module
max.pipelines.architectures.diffusion_gemma
DiffusionGemma block-diffusion architecture.
DiffusionGemmaForBlockDiffusionConfig
class max.pipelines.architectures.diffusion_gemma.DiffusionGemmaForBlockDiffusionConfig(*, devices, dtype, unquantized_dtype=bfloat16, kv_params, image_token_index, video_token_index=262144, text_config, vision_config, tie_word_embeddings=False, canvas_length=256, boi_token_id=255999, eoi_token_id=258882)
Bases: Gemma4ForConditionalGenerationConfig
Top-level MAX config for DiffusionGemma block diffusion.
Extends the Gemma4 multimodal config with the block-diffusion canvas
geometry. KV cache parameters, per-layer-type attention geometry, MoE
sizing, and the vision tower config are all inherited from the donor,
which reads them through the view installed by initialize_from_config.
-
Parameters:
-
- devices (list[DeviceRef])
- dtype (DType)
- unquantized_dtype (DType)
- kv_params (MultiKVCacheParams)
- image_token_index (int)
- video_token_index (int)
- text_config (Gemma4TextConfig)
- vision_config (Gemma4VisionConfig | None)
- tie_word_embeddings (bool)
- canvas_length (int)
- boi_token_id (int)
- eoi_token_id (int)
boi_token_id
boi_token_id: int = 255999
Begin-of-image token id wrapping image prompts.
canvas_length
canvas_length: int = 256
Number of tokens denoised per block-diffusion canvas.
eoi_token_id
eoi_token_id: int = 258882
End-of-image token id wrapping image prompts.
finalize()
finalize(huggingface_config, state_dict, return_logits)
Finalize with state_dict-dependent fields.
Parses quantization config from the weights and finalizes the text sub-config.
-
Parameters:
-
- huggingface_config (AutoConfig) – HuggingFace model configuration.
- state_dict (dict[str, WeightData]) – Model weights dictionary.
- return_logits (ReturnLogits) – Return logits configuration.
-
Return type:
-
None
initialize_from_config()
classmethod initialize_from_config(pipeline_config, huggingface_config)
Initializes from pipeline and HuggingFace configs.
Fields that depend on the state_dict should be set via finalize().
-
Parameters:
-
- pipeline_config (PipelineConfig) – The MAX Engine pipeline configuration.
- huggingface_config (AutoConfig) – Top-level HuggingFace model configuration.
-
Returns:
-
A config instance ready for finalization.
-
Return type:
DiffusionGemmaForBlockDiffusionModel
class max.pipelines.architectures.diffusion_gemma.DiffusionGemmaForBlockDiffusionModel(pipeline_config, session, devices, kv_cache_config, weights, adapter=None, return_logits=ReturnLogits.LAST_TOKEN)
Bases: Gemma3_MultiModalModel
Compiles the vision/encoder/decoder graphs for block diffusion.
Differences from the donor pipeline model:
load_modeluses this port’s weight converters (decoder-canonical checkpoint layout) and compiles an extra decoder graph.execute_decoder_stepruns one denoise step: canvas K/V are written into the cache slots after each request’s committed length, so calling it repeatedly without advancing context lengths overwrites the same slots (read-only cache semantics from the encoder’s perspective).
-
Parameters:
-
- pipeline_config (PipelineConfig)
- session (InferenceSession)
- devices (list[Device])
- kv_cache_config (KVCacheConfig)
- weights (Weights)
- adapter (WeightsAdapter | None)
- return_logits (ReturnLogits)
decoder_model
decoder_model: Model
execute_decoder_step()
execute_decoder_step(canvas_tokens, input_row_offsets, sc_logits, sc_enabled, temperature, kv_cache_inputs)
Runs one denoise step.
Returns (sc_logits_out, argmax, topk_probs, topk_idx, entropy);
sc_logits_out is device-resident bf16 and feeds the next step’s
sc_logits; the [N, 64] top-k pair feeds host-side categorical
sampling.
load_model()
load_model(session)
Loads the compiled Gemma3 MultiModal models into the MAX Engine session.
-
Returns:
-
A tuple of (vision_model, language_model).
-
Parameters:
-
session (InferenceSession)
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!