Python module

max.pipelines.architectures.qwen_image

Qwen-Image diffusion architecture for image generation.

`QwenImageArchConfig`

class max.pipelines.architectures.qwen_image.QwenImageArchConfig(*, pipeline_config)

source

Bases: ArchConfig

Pipeline-level config for QwenImage (implements ArchConfig; no KV cache).

Parameters:: pipeline_config (PipelineConfig)

`get_max_seq_len()`

get_max_seq_len()

source

Returns the default maximum sequence length for the model.

Subclasses should determine whether this value can be overridden by setting the --max-length (pipeline_config.model.max_length) flag.

Return type:: int

`initialize()`

classmethod initialize(pipeline_config, model_config=None)

source

Initialize the config from a PipelineConfig.

Parameters:

pipeline_config (PipelineConfig) – The pipeline configuration.
model_config (MAXModelConfig | None) – The model configuration to read from. When None (the default), pipeline_config.model is used. Pass an explicit config (e.g. pipeline_config.draft_model) to initialize the arch config for a different model.

Return type:

Self

`pipeline_config`

pipeline_config: PipelineConfig

source

`QwenImageConfig`

class max.pipelines.architectures.qwen_image.QwenImageConfig(*, config_file=None, section_name=None, patch_size=2, in_channels=64, out_channels=None, num_layers=60, attention_head_dim=128, num_attention_heads=24, joint_attention_dim=3584, guidance_embeds=False, axes_dims_rope=(16, 56, 56), rope_theta=10000, zero_cond_t=False, eps=1e-06, dtype=bfloat16, device=<factory>)

source

Bases: QwenImageConfigBase

Parameters:

config_file (str | None)
section_name (str | None)
patch_size (int)
in_channels (int)
out_channels (int | None)
num_layers (int)
attention_head_dim (int)
num_attention_heads (int)
joint_attention_dim (int)
guidance_embeds (bool)
axes_dims_rope (tuple[int, ...])
rope_theta (int)
zero_cond_t (bool)
eps (float)
dtype (DType)
device (DeviceRef)

`generate()`

static generate(config_dict, encoding, devices)

source

Parameters:

config_dict (dict[str, Any])
encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'])
devices (list[Device])

Return type:

QwenImageConfigBase

`model_config`

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

`QwenImageConfigBase`

class max.pipelines.architectures.qwen_image.QwenImageConfigBase(*, config_file=None, section_name=None, patch_size=2, in_channels=64, out_channels=None, num_layers=60, attention_head_dim=128, num_attention_heads=24, joint_attention_dim=3584, guidance_embeds=False, axes_dims_rope=(16, 56, 56), rope_theta=10000, zero_cond_t=False, eps=1e-06, dtype=bfloat16, device=<factory>)

source

Bases: MAXModelConfigBase

Parameters:

config_file (str | None)
section_name (str | None)
patch_size (int)
in_channels (int)
out_channels (int | None)
num_layers (int)
attention_head_dim (int)
num_attention_heads (int)
joint_attention_dim (int)
guidance_embeds (bool)
axes_dims_rope (tuple[int, ...])
rope_theta (int)
zero_cond_t (bool)
eps (float)
dtype (DType)
device (DeviceRef)

`attention_head_dim`

attention_head_dim: int

source

`axes_dims_rope`

axes_dims_rope: tuple[int, ...]

source

`device`

device: DeviceRef

source

`dtype`

dtype: DType

source

`eps`

eps: float

source

`guidance_embeds`

guidance_embeds: bool

source

`in_channels`

in_channels: int

source

`joint_attention_dim`

joint_attention_dim: int

source

`model_config`

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

`num_attention_heads`

num_attention_heads: int

source

`num_layers`

num_layers: int

source

`out_channels`

out_channels: int | None

source

`patch_size`

patch_size: int

source

`rope_theta`

rope_theta: int

source

`zero_cond_t`

zero_cond_t: bool

source

`QwenImageTransformerModel`

class max.pipelines.architectures.qwen_image.QwenImageTransformerModel(config, encoding, devices, weights, session)

source

Bases: ComponentModel

Parameters:

config (dict[str, Any])
encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'])
devices (list[Device])
weights (Weights)
session (InferenceSession)

`load_model()`

load_model()

source

Load and return a runtime model instance.

Return type:: Callable[[…], Any]

QwenImageArchConfig​

get_max_seq_len()​

initialize()​

pipeline_config​

QwenImageConfig​

generate()​

model_config​

QwenImageConfigBase​

attention_head_dim​

axes_dims_rope​

device​

dtype​

eps​

guidance_embeds​

in_channels​

joint_attention_dim​

model_config​

num_attention_heads​

num_layers​

out_channels​

patch_size​

rope_theta​

zero_cond_t​

QwenImageTransformerModel​

load_model()​

`QwenImageArchConfig`

`get_max_seq_len()`

`initialize()`

`pipeline_config`

`QwenImageConfig`

`generate()`

`model_config`

`QwenImageConfigBase`

`attention_head_dim`

`axes_dims_rope`

`device`

`dtype`

`eps`

`guidance_embeds`

`in_channels`

`joint_attention_dim`

`model_config`

`num_attention_heads`

`num_layers`

`out_channels`

`patch_size`

`rope_theta`

`zero_cond_t`

`QwenImageTransformerModel`

`load_model()`