Skip to main content

Python module

max.pipelines.architectures.z_image_modulev3

Z-Image diffusion architecture for image generation.

ZImageArchConfig

class max.pipelines.architectures.z_image_modulev3.ZImageArchConfig(*, pipeline_config: 'PipelineConfig')

source

Bases: ArchConfig

Parameters:

pipeline_config (PipelineConfig)

get_max_seq_len()

get_max_seq_len()

source

Returns the default maximum sequence length for the model.

Subclasses should determine whether this value can be overridden by setting the --max-length (pipeline_config.model.max_length) flag.

Return type:

int

initialize()

classmethod initialize(pipeline_config, model_config=None)

source

Initialize the config from a PipelineConfig.

Parameters:

  • pipeline_config (PipelineConfig) – The pipeline configuration.
  • model_config (MAXModelConfig | None) – The model configuration to read from. When None (the default), pipeline_config.model is used. Pass an explicit config (e.g. pipeline_config.draft_model) to initialize the arch config for a different model.

Return type:

Self

pipeline_config

pipeline_config: PipelineConfig

source

ZImagePipeline

class max.pipelines.architectures.z_image_modulev3.ZImagePipeline(pipeline_config, session, devices, weight_paths, cache_config=None, **kwargs)

source

Bases: DiffusionPipeline

Diffusion pipeline for Z-Image generation (Qwen3 + transformer + VAE).

Parameters:

build_decode_latents()

build_decode_latents()

source

Return type:

None

build_prepare_scheduler()

build_prepare_scheduler()

source

Return type:

None

build_preprocess_latents()

build_preprocess_latents()

source

Return type:

None

build_scheduler_step()

build_scheduler_step()

source

Return type:

None

components

components: dict[str, type[ComponentModel]] | None = {'text_encoder': <class 'max.pipelines.architectures.qwen3_modulev3.text_encoder.model.Qwen3TextEncoderZImageModel'>, 'transformer': <class 'max.pipelines.architectures.z_image_modulev3.model.ZImageTransformerModel'>, 'vae': <class 'max.pipelines.architectures.autoencoders_modulev3.autoencoder_kl.AutoencoderKLModel'>}

source

create_cache_state()

create_cache_state(batch_size, seq_len, transformer_config, text_seq_len=0)

source

Allocate FBCache / Taylor tensors using Z-Image output layout.

Parameters:

  • batch_size (int)
  • seq_len (int)
  • transformer_config (Any)
  • text_seq_len (int)

Return type:

DenoisingCacheState

decode_latents()

decode_latents(latents, h_carrier, w_carrier, output_type='np')

source

Decode packed latents into image output.

Parameters:

Return type:

Tensor | ndarray

default_num_inference_steps

default_num_inference_steps: int = 50

source

Default number of denoising steps when the user does not specify one.

Subclasses may override this to provide a model-appropriate default.

default_residual_threshold

default_residual_threshold: float = 0.06

source

Model-specific default for the FBCache relative difference threshold.

Subclasses may override this to provide a model-appropriate default. Used when the request does not specify a residual_threshold.

execute()

execute(model_inputs, output_type='np')

source

Run the Z-Image denoising loop and decode outputs.

Parameters:

  • model_inputs (ZImageModelInputs)
  • output_type (Literal['np', 'latent', 'pil'])

Return type:

ZImagePipelineOutput

init_remaining_components()

init_remaining_components()

source

Initialize derived attributes and compiled subgraphs.

Return type:

None

prepare_img2img_latents()

prepare_img2img_latents(noise_latents, image_tensor, sigmas)

source

Parameters:

Return type:

Tensor

prepare_inputs()

prepare_inputs(context)

source

Convert a PixelGenerationContext into model inputs with device tensors.

Parameters:

context (PixelContext)

Return type:

ZImageModelInputs

prepare_prompt_embeddings()

prepare_prompt_embeddings(tokens, num_images_per_prompt)

source

Encode prompt tokens into text embeddings.

Parameters:

  • tokens (Tensor)
  • num_images_per_prompt (int)

Return type:

Tensor

prepare_scheduler()

static prepare_scheduler(sigmas)

source

Precompute denoising timesteps and step deltas.

Parameters:

sigmas (Tensor)

Return type:

tuple[Tensor, Tensor]

preprocess_latents()

preprocess_latents(latents, dtype)

source

Patchify and pack latents before denoising.

Parameters:

Return type:

Tensor

run_transformer()

run_transformer(cache_state, **kwargs)

source

Run the transformer for one denoising step.

Subclasses must override this to call their transformer with the appropriate model-specific arguments. The method should return (noise_pred,) when first_block_caching is disabled, or (new_residual, noise_pred) when first_block_caching is enabled.

Parameters:

  • cache_state (DenoisingCacheState) – Per-request mutable cache state for this stream.
  • **kwargs (Any) – Model-specific arguments forwarded from run_denoising_step.

Return type:

tuple[Tensor, …]

scheduler_step()

static scheduler_step(latents, noise_pred, dt)

source

Parameters:

Return type:

Tensor

text_encoder

text_encoder: Qwen3TextEncoderZImageModel

source

transformer

transformer: ZImageTransformerModel

source

unprefixed_weight_component

unprefixed_weight_component: str | None = 'transformer'

source

When set, weight files without a <component>/ prefix are assigned to this component. This supports multi-repo layouts where quantized weights for one component (e.g. the transformer) are shipped as flat files in a separate repo while the remaining components use the base model repo.

vae

vae: AutoencoderKLModel

source

ZImageTransformerModel

class max.pipelines.architectures.z_image_modulev3.ZImageTransformerModel(config, encoding, devices, weights, *, cache_config=None)

source

Bases: ComponentModel

Component wrapper for the compiled Z-Image transformer graph.

Parameters:

load_model()

load_model()

source

Load and return a runtime model instance.

Return type:

None

model

model: Callable[[...], Any]

source