Python module
max.pipelines.architectures.z_image_modulev3
Z-Image diffusion architecture for image generation.
ZImageArchConfig
class max.pipelines.architectures.z_image_modulev3.ZImageArchConfig(*, pipeline_config: 'PipelineConfig')
Bases: ArchConfig
-
Parameters:
-
pipeline_config (PipelineConfig)
get_max_seq_len()
get_max_seq_len()
Returns the default maximum sequence length for the model.
Subclasses should determine whether this value can be overridden by
setting the --max-length (pipeline_config.model.max_length) flag.
-
Return type:
initialize()
classmethod initialize(pipeline_config, model_config=None)
Initialize the config from a PipelineConfig.
-
Parameters:
-
- pipeline_config (PipelineConfig) – The pipeline configuration.
- model_config (MAXModelConfig | None) – The model configuration to read from. When
None(the default),pipeline_config.modelis used. Pass an explicit config (e.g.pipeline_config.draft_model) to initialize the arch config for a different model.
-
Return type:
pipeline_config
pipeline_config: PipelineConfig
ZImagePipeline
class max.pipelines.architectures.z_image_modulev3.ZImagePipeline(pipeline_config, session, devices, weight_paths, cache_config=None, **kwargs)
Bases: DiffusionPipeline
Diffusion pipeline for Z-Image generation (Qwen3 + transformer + VAE).
-
Parameters:
-
- pipeline_config (PipelineConfig)
- session (InferenceSession)
- devices (list[Device])
- weight_paths (list[Path])
- cache_config (DenoisingCacheConfig | None)
- kwargs (Any)
build_decode_latents()
build_decode_latents()
-
Return type:
-
None
build_prepare_scheduler()
build_prepare_scheduler()
-
Return type:
-
None
build_preprocess_latents()
build_preprocess_latents()
-
Return type:
-
None
build_scheduler_step()
build_scheduler_step()
-
Return type:
-
None
components
components: dict[str, type[ComponentModel]] | None = {'text_encoder': <class 'max.pipelines.architectures.qwen3_modulev3.text_encoder.model.Qwen3TextEncoderZImageModel'>, 'transformer': <class 'max.pipelines.architectures.z_image_modulev3.model.ZImageTransformerModel'>, 'vae': <class 'max.pipelines.architectures.autoencoders_modulev3.autoencoder_kl.AutoencoderKLModel'>}
create_cache_state()
create_cache_state(batch_size, seq_len, transformer_config, text_seq_len=0)
Allocate FBCache / Taylor tensors using Z-Image output layout.
decode_latents()
decode_latents(latents, h_carrier, w_carrier, output_type='np')
Decode packed latents into image output.
default_num_inference_steps
default_num_inference_steps: int = 50
Default number of denoising steps when the user does not specify one.
Subclasses may override this to provide a model-appropriate default.
default_residual_threshold
default_residual_threshold: float = 0.06
Model-specific default for the FBCache relative difference threshold.
Subclasses may override this to provide a model-appropriate default.
Used when the request does not specify a residual_threshold.
execute()
execute(model_inputs, output_type='np')
Run the Z-Image denoising loop and decode outputs.
-
Parameters:
-
- model_inputs (ZImageModelInputs)
- output_type (Literal['np', 'latent', 'pil'])
-
Return type:
-
ZImagePipelineOutput
init_remaining_components()
init_remaining_components()
Initialize derived attributes and compiled subgraphs.
-
Return type:
-
None
prepare_img2img_latents()
prepare_img2img_latents(noise_latents, image_tensor, sigmas)
prepare_inputs()
prepare_inputs(context)
Convert a PixelGenerationContext into model inputs with device tensors.
-
Parameters:
-
context (PixelContext)
-
Return type:
-
ZImageModelInputs
prepare_prompt_embeddings()
prepare_prompt_embeddings(tokens, num_images_per_prompt)
Encode prompt tokens into text embeddings.
prepare_scheduler()
static prepare_scheduler(sigmas)
Precompute denoising timesteps and step deltas.
preprocess_latents()
preprocess_latents(latents, dtype)
Patchify and pack latents before denoising.
run_transformer()
run_transformer(cache_state, **kwargs)
Run the transformer for one denoising step.
Subclasses must override this to call their transformer with the
appropriate model-specific arguments. The method should return
(noise_pred,) when first_block_caching is disabled, or
(new_residual, noise_pred) when first_block_caching is enabled.
scheduler_step()
static scheduler_step(latents, noise_pred, dt)
text_encoder
text_encoder: Qwen3TextEncoderZImageModel
transformer
transformer: ZImageTransformerModel
unprefixed_weight_component
When set, weight files without a <component>/ prefix are assigned to
this component. This supports multi-repo layouts where quantized weights
for one component (e.g. the transformer) are shipped as flat files in a
separate repo while the remaining components use the base model repo.
vae
vae: AutoencoderKLModel
ZImageTransformerModel
class max.pipelines.architectures.z_image_modulev3.ZImageTransformerModel(config, encoding, devices, weights, *, cache_config=None)
Bases: ComponentModel
Component wrapper for the compiled Z-Image transformer graph.
-
Parameters:
load_model()
load_model()
Load and return a runtime model instance.
-
Return type:
-
None
model
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!