Python module
max.pipelines.architectures.qwen_image_edit
Qwen-Image-Edit diffusion architecture for image editing.
QwenImageEditPipeline
class max.pipelines.architectures.qwen_image_edit.QwenImageEditPipeline(pipeline_config, session, devices, weight_paths, cache_config=None, **kwargs)
Bases: DiffusionPipeline
Diffusion pipeline for QwenImage image editing.
Wires together:
- Qwen2.5-VL prompt encoder
- QwenImage edit transformer denoiser
- QwenImage 3D VAE (with latents_mean/std normalization)
- Image-conditioning path (VAE encode -> normalize -> patchify -> concat)
-
Parameters:
-
- pipeline_config (PipelineConfig)
- session (InferenceSession)
- devices (list[Device])
- weight_paths (list[Path])
- cache_config (DenoisingCacheConfig | None)
- kwargs (Any)
PROMPT_TEMPLATE_DROP_IDX
PROMPT_TEMPLATE_DROP_IDX = 34
components
components: dict[str, type[ComponentModel]] | None = {'text_encoder': <class 'max.pipelines.architectures.qwen2_5vl.encoder.model.Qwen25VLEncoderModel'>, 'transformer': <class 'max.pipelines.architectures.qwen_image_edit.model.QwenImageEditTransformerModel'>, 'vae': <class 'max.pipelines.architectures.autoencoders.autoencoder_kl_qwen_image.AutoencoderKLQwenImageModel'>}
concat_image_latents()
concat_image_latents(latents, image_latents, latent_image_ids, image_latent_ids)
-
Parameters:
-
- latents (TensorValue)
- image_latents (TensorValue)
- latent_image_ids (TensorValue)
- image_latent_ids (TensorValue)
-
Return type:
decode_latents()
decode_latents(latents, height, width, output_type='np')
Decode packed latents into an image array.
execute()
execute(model_inputs, output_type='np')
Run the QwenImageEdit denoising loop and decode outputs.
-
Parameters:
-
- model_inputs (QwenImageEditModelInputs)
- output_type (Literal['np', 'latent'])
-
Return type:
-
QwenImageEditPipelineOutput
init_remaining_components()
init_remaining_components()
Initialize derived attributes that depend on loaded components.
-
Return type:
-
None
prepare_image_latents()
prepare_image_latents(images, batch_size, device)
prepare_inputs()
prepare_inputs(context)
Convert a PixelContext into QwenImageEditModelInputs.
-
Parameters:
-
context (PixelContext)
-
Return type:
-
QwenImageEditModelInputs
prepare_prompt_embeddings()
prepare_prompt_embeddings(tokens, num_images_per_prompt=1)
-
Parameters:
-
- tokens (TokenBuffer)
- num_images_per_prompt (int)
-
Return type:
prepare_scheduler()
prepare_scheduler(sigmas)
Precompute timesteps and dt values from sigmas.
-
Parameters:
-
sigmas (TensorValue)
-
Return type:
preprocess_latents()
preprocess_latents(latents, latent_image_ids)
prompt_encoder
prompt_encoder: Qwen25VLMultimodalEncoderModel | None = None
scheduler_step()
scheduler_step(latents, noise_pred, dt, img_ids)
Single Euler step that updates only the noise-token prefix.
-
Parameters:
-
- latents (TensorValue)
- noise_pred (TensorValue)
- dt (TensorValue)
- img_ids (TensorValue)
-
Return type:
text_encoder
text_encoder: Qwen25VLEncoderModel
transformer
transformer: QwenImageEditTransformerModel
vae
vae: AutoencoderKLQwenImageModel
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!