Python module
max.pipelines.architectures.qwen_image
Qwen-Image diffusion architecture for image generation.
QwenImageArchConfig
class max.pipelines.architectures.qwen_image.QwenImageArchConfig(*, pipeline_config)
Bases: ArchConfig
Pipeline-level config for QwenImage (implements ArchConfig; no KV cache).
-
Parameters:
-
pipeline_config (PipelineConfig)
get_max_seq_len()
get_max_seq_len()
Returns the default maximum sequence length for the model.
Subclasses should determine whether this value can be overridden by
setting the --max-length (pipeline_config.model.max_length) flag.
-
Return type:
initialize()
classmethod initialize(pipeline_config, model_config=None)
Initialize the config from a PipelineConfig.
-
Parameters:
-
- pipeline_config (PipelineConfig) – The pipeline configuration.
- model_config (MAXModelConfig | None) – The model configuration to read from. When
None(the default),pipeline_config.modelis used. Pass an explicit config (e.g.pipeline_config.draft_model) to initialize the arch config for a different model.
-
Return type:
pipeline_config
pipeline_config: PipelineConfig
QwenImagePipeline
class max.pipelines.architectures.qwen_image.QwenImagePipeline(pipeline_config, session, devices, weight_paths, cache_config=None, **kwargs)
Bases: DiffusionPipeline
Diffusion pipeline for QwenImage text-to-image generation.
Wires together:
- Qwen2.5-VL text encoder
- QwenImage transformer denoiser (60 dual-stream blocks)
- QwenImage 3D VAE (with latents_mean/std normalization)
-
Parameters:
-
- pipeline_config (PipelineConfig)
- session (InferenceSession)
- devices (list[Device])
- weight_paths (list[Path])
- cache_config (DenoisingCacheConfig | None)
- kwargs (Any)
PROMPT_TEMPLATE_DROP_IDX
PROMPT_TEMPLATE_DROP_IDX = 34
components
components: dict[str, type[ComponentModel]] | None = {'text_encoder': <class 'max.pipelines.architectures.qwen2_5vl.encoder.model.Qwen25VLEncoderModel'>, 'transformer': <class 'max.pipelines.architectures.qwen_image.model.QwenImageTransformerModel'>, 'vae': <class 'max.pipelines.architectures.autoencoders.autoencoder_kl_qwen_image.AutoencoderKLQwenImageModel'>}
decode_latents()
decode_latents(latents, height, width, output_type='np')
Decode packed latents into an image array.
execute()
execute(model_inputs, output_type='np')
Run the QwenImage denoising loop and decode outputs.
Supports true classifier-free guidance with separate positive and negative prompt forward passes.
-
Parameters:
-
- model_inputs (QwenImageModelInputs)
- output_type (Literal['np', 'latent'])
-
Return type:
-
QwenImagePipelineOutput
init_remaining_components()
init_remaining_components()
Initialize derived attributes that depend on loaded components.
-
Return type:
-
None
prepare_inputs()
prepare_inputs(context)
Convert a PixelContext into QwenImageModelInputs.
-
Parameters:
-
context (PixelContext)
-
Return type:
-
QwenImageModelInputs
prepare_prompt_embeddings()
prepare_prompt_embeddings(tokens, num_images_per_prompt=1)
Create prompt embeddings from tokens.
QwenImage uses the last hidden state from the text encoder (layer -1). The tokens include a chat template prefix (~34 tokens) that must be dropped from the encoder output to match diffusers’ behavior.
-
Parameters:
-
- tokens (TokenBuffer)
- num_images_per_prompt (int)
-
Return type:
prepare_scheduler()
prepare_scheduler(sigmas)
Precompute timesteps and dt values from sigmas.
-
Parameters:
-
sigmas (TensorValue)
-
Return type:
preprocess_latents()
preprocess_latents(latents, latent_image_ids)
scheduler_step()
scheduler_step(latents, noise_pred, dt)
Apply a single Euler update step.
-
Parameters:
-
- latents (TensorValue)
- noise_pred (TensorValue)
- dt (TensorValue)
-
Return type:
text_encoder
text_encoder: Qwen25VLEncoderModel
transformer
transformer: QwenImageTransformerModel
vae
vae: AutoencoderKLQwenImageModel
QwenImageTransformerModel
class max.pipelines.architectures.qwen_image.QwenImageTransformerModel(config, encoding, devices, weights, session)
Bases: ComponentModel
-
Parameters:
load_model()
load_model()
Load and return a runtime model instance.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!