Skip to main content

Python module

max.pipelines.architectures.qwen_image

Qwen-Image diffusion architecture for image generation.

QwenImageArchConfig

class max.pipelines.architectures.qwen_image.QwenImageArchConfig(*, pipeline_config)

source

Bases: ArchConfig

Pipeline-level config for QwenImage (implements ArchConfig; no KV cache).

Parameters:

pipeline_config (PipelineConfig)

get_max_seq_len()

get_max_seq_len()

source

Returns the default maximum sequence length for the model.

Subclasses should determine whether this value can be overridden by setting the --max-length (pipeline_config.model.max_length) flag.

Return type:

int

initialize()

classmethod initialize(pipeline_config, model_config=None)

source

Initialize the config from a PipelineConfig.

Parameters:

  • pipeline_config (PipelineConfig) – The pipeline configuration.
  • model_config (MAXModelConfig | None) – The model configuration to read from. When None (the default), pipeline_config.model is used. Pass an explicit config (e.g. pipeline_config.draft_model) to initialize the arch config for a different model.

Return type:

Self

pipeline_config

pipeline_config: PipelineConfig

source

QwenImagePipeline

class max.pipelines.architectures.qwen_image.QwenImagePipeline(pipeline_config, session, devices, weight_paths, cache_config=None, **kwargs)

source

Bases: DiffusionPipeline

Diffusion pipeline for QwenImage text-to-image generation.

Wires together:

  • Qwen2.5-VL text encoder
  • QwenImage transformer denoiser (60 dual-stream blocks)
  • QwenImage 3D VAE (with latents_mean/std normalization)

Parameters:

PROMPT_TEMPLATE_DROP_IDX

PROMPT_TEMPLATE_DROP_IDX = 34

source

components

components: dict[str, type[ComponentModel]] | None = {'text_encoder': <class 'max.pipelines.architectures.qwen2_5vl.encoder.model.Qwen25VLEncoderModel'>, 'transformer': <class 'max.pipelines.architectures.qwen_image.model.QwenImageTransformerModel'>, 'vae': <class 'max.pipelines.architectures.autoencoders.autoencoder_kl_qwen_image.AutoencoderKLQwenImageModel'>}

source

decode_latents()

decode_latents(latents, height, width, output_type='np')

source

Decode packed latents into an image array.

Parameters:

Return type:

ndarray | Buffer

execute()

execute(model_inputs, output_type='np')

source

Run the QwenImage denoising loop and decode outputs.

Supports true classifier-free guidance with separate positive and negative prompt forward passes.

Parameters:

  • model_inputs (QwenImageModelInputs)
  • output_type (Literal['np', 'latent'])

Return type:

QwenImagePipelineOutput

init_remaining_components()

init_remaining_components()

source

Initialize derived attributes that depend on loaded components.

Return type:

None

prepare_inputs()

prepare_inputs(context)

source

Convert a PixelContext into QwenImageModelInputs.

Parameters:

context (PixelContext)

Return type:

QwenImageModelInputs

prepare_prompt_embeddings()

prepare_prompt_embeddings(tokens, num_images_per_prompt=1)

source

Create prompt embeddings from tokens.

QwenImage uses the last hidden state from the text encoder (layer -1). The tokens include a chat template prefix (~34 tokens) that must be dropped from the encoder output to match diffusers’ behavior.

Parameters:

Return type:

Buffer

prepare_scheduler()

prepare_scheduler(sigmas)

source

Precompute timesteps and dt values from sigmas.

Parameters:

sigmas (TensorValue)

Return type:

tuple[TensorValue, TensorValue]

preprocess_latents()

preprocess_latents(latents, latent_image_ids)

source

Parameters:

Return type:

tuple[Buffer, Buffer]

scheduler_step()

scheduler_step(latents, noise_pred, dt)

source

Apply a single Euler update step.

Parameters:

Return type:

TensorValue

text_encoder

text_encoder: Qwen25VLEncoderModel

source

transformer

transformer: QwenImageTransformerModel

source

vae

vae: AutoencoderKLQwenImageModel

source

QwenImageTransformerModel

class max.pipelines.architectures.qwen_image.QwenImageTransformerModel(config, encoding, devices, weights, session)

source

Bases: ComponentModel

Parameters:

load_model()

load_model()

source

Load and return a runtime model instance.

Return type:

Callable[[…], Any]