Skip to main content

Python module

max.pipelines.architectures.qwen_image

Qwen-Image diffusion architecture for image generation.

QwenImageArchConfig​

class max.pipelines.architectures.qwen_image.QwenImageArchConfig(*, pipeline_config)

source

Bases: ArchConfig

Pipeline-level config for QwenImage (implements ArchConfig; no KV cache).

Parameters:

pipeline_config (PipelineConfig)

get_max_seq_len()​

get_max_seq_len()

source

Returns the default maximum sequence length for the model.

Subclasses should determine whether this value can be overridden by setting the --max-length (pipeline_config.model.max_length) flag.

Return type:

int

initialize()​

classmethod initialize(pipeline_config, model_config=None)

source

Initialize the config from a PipelineConfig.

Parameters:

  • pipeline_config (PipelineConfig) – The pipeline configuration.
  • model_config (MAXModelConfig | None) – The model configuration to read from. When None (the default), pipeline_config.model is used. Pass an explicit config (e.g. pipeline_config.draft_model) to initialize the arch config for a different model.

Return type:

Self

pipeline_config​

pipeline_config: PipelineConfig

source

QwenImageConfig​

class max.pipelines.architectures.qwen_image.QwenImageConfig(*, config_file=None, section_name=None, patch_size=2, in_channels=64, out_channels=None, num_layers=60, attention_head_dim=128, num_attention_heads=24, joint_attention_dim=3584, guidance_embeds=False, axes_dims_rope=(16, 56, 56), rope_theta=10000, zero_cond_t=False, eps=1e-06, dtype=bfloat16, device=<factory>)

source

Bases: QwenImageConfigBase

Parameters:

  • config_file (str | None)
  • section_name (str | None)
  • patch_size (int)
  • in_channels (int)
  • out_channels (int | None)
  • num_layers (int)
  • attention_head_dim (int)
  • num_attention_heads (int)
  • joint_attention_dim (int)
  • guidance_embeds (bool)
  • axes_dims_rope (tuple[int, ...])
  • rope_theta (int)
  • zero_cond_t (bool)
  • eps (float)
  • dtype (DType)
  • device (DeviceRef)

generate()​

static generate(config_dict, encoding, devices)

source

Parameters:

  • config_dict (dict[str, Any])
  • encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'])
  • devices (list[Device])

Return type:

QwenImageConfigBase

model_config​

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

QwenImageConfigBase​

class max.pipelines.architectures.qwen_image.QwenImageConfigBase(*, config_file=None, section_name=None, patch_size=2, in_channels=64, out_channels=None, num_layers=60, attention_head_dim=128, num_attention_heads=24, joint_attention_dim=3584, guidance_embeds=False, axes_dims_rope=(16, 56, 56), rope_theta=10000, zero_cond_t=False, eps=1e-06, dtype=bfloat16, device=<factory>)

source

Bases: MAXModelConfigBase

Parameters:

  • config_file (str | None)
  • section_name (str | None)
  • patch_size (int)
  • in_channels (int)
  • out_channels (int | None)
  • num_layers (int)
  • attention_head_dim (int)
  • num_attention_heads (int)
  • joint_attention_dim (int)
  • guidance_embeds (bool)
  • axes_dims_rope (tuple[int, ...])
  • rope_theta (int)
  • zero_cond_t (bool)
  • eps (float)
  • dtype (DType)
  • device (DeviceRef)

attention_head_dim​

attention_head_dim: int

source

axes_dims_rope​

axes_dims_rope: tuple[int, ...]

source

device​

device: DeviceRef

source

dtype​

dtype: DType

source

eps​

eps: float

source

guidance_embeds​

guidance_embeds: bool

source

in_channels​

in_channels: int

source

joint_attention_dim​

joint_attention_dim: int

source

model_config​

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_attention_heads​

num_attention_heads: int

source

num_layers​

num_layers: int

source

out_channels​

out_channels: int | None

source

patch_size​

patch_size: int

source

rope_theta​

rope_theta: int

source

zero_cond_t​

zero_cond_t: bool

source

QwenImageTransformerModel​

class max.pipelines.architectures.qwen_image.QwenImageTransformerModel(config, encoding, devices, weights, session)

source

Bases: ComponentModel

Parameters:

load_model()​

load_model()

source

Load and return a runtime model instance.

Return type:

Callable[[…], Any]