Skip to main content

Python module

max.pipelines.architectures.qwen_image_edit

Qwen-Image-Edit diffusion architecture for image editing.

QwenImageEditPipeline

class max.pipelines.architectures.qwen_image_edit.QwenImageEditPipeline(pipeline_config, session, devices, weight_paths, cache_config=None, **kwargs)

source

Bases: DiffusionPipeline

Diffusion pipeline for QwenImage image editing.

Wires together:

  • Qwen2.5-VL prompt encoder
  • QwenImage edit transformer denoiser
  • QwenImage 3D VAE (with latents_mean/std normalization)
  • Image-conditioning path (VAE encode -> normalize -> patchify -> concat)

Parameters:

PROMPT_TEMPLATE_DROP_IDX

PROMPT_TEMPLATE_DROP_IDX = 34

source

components

components: dict[str, type[ComponentModel]] | None = {'text_encoder': <class 'max.pipelines.architectures.qwen2_5vl.encoder.model.Qwen25VLEncoderModel'>, 'transformer': <class 'max.pipelines.architectures.qwen_image_edit.model.QwenImageEditTransformerModel'>, 'vae': <class 'max.pipelines.architectures.autoencoders.autoencoder_kl_qwen_image.AutoencoderKLQwenImageModel'>}

source

concat_image_latents()

concat_image_latents(latents, image_latents, latent_image_ids, image_latent_ids)

source

Parameters:

Return type:

tuple[TensorValue, TensorValue]

decode_latents()

decode_latents(latents, height, width, output_type='np')

source

Decode packed latents into an image array.

Parameters:

Return type:

ndarray | Buffer

execute()

execute(model_inputs, output_type='np')

source

Run the QwenImageEdit denoising loop and decode outputs.

Parameters:

  • model_inputs (QwenImageEditModelInputs)
  • output_type (Literal['np', 'latent'])

Return type:

QwenImageEditPipelineOutput

init_remaining_components()

init_remaining_components()

source

Initialize derived attributes that depend on loaded components.

Return type:

None

prepare_image_latents()

prepare_image_latents(images, batch_size, device)

source

Parameters:

Return type:

tuple[Buffer, Buffer]

prepare_inputs()

prepare_inputs(context)

source

Convert a PixelContext into QwenImageEditModelInputs.

Parameters:

context (PixelContext)

Return type:

QwenImageEditModelInputs

prepare_prompt_embeddings()

prepare_prompt_embeddings(tokens, num_images_per_prompt=1)

source

Parameters:

Return type:

Buffer

prepare_scheduler()

prepare_scheduler(sigmas)

source

Precompute timesteps and dt values from sigmas.

Parameters:

sigmas (TensorValue)

Return type:

tuple[TensorValue, TensorValue]

preprocess_latents()

preprocess_latents(latents, latent_image_ids)

source

Parameters:

Return type:

tuple[Buffer, Buffer]

prompt_encoder

prompt_encoder: Qwen25VLMultimodalEncoderModel | None = None

source

scheduler_step()

scheduler_step(latents, noise_pred, dt, img_ids)

source

Single Euler step that updates only the noise-token prefix.

Parameters:

Return type:

TensorValue

text_encoder

text_encoder: Qwen25VLEncoderModel

source

transformer

transformer: QwenImageEditTransformerModel

source

vae

vae: AutoencoderKLQwenImageModel

source