Skip to main content

Python module

max.pipelines.architectures.z_image_modulev3

Z-Image diffusion architecture for image generation.

ZImageArchConfig​

class max.pipelines.architectures.z_image_modulev3.ZImageArchConfig(*, pipeline_config: 'PipelineConfig')

source

Bases: ArchConfig

Parameters:

pipeline_config (PipelineConfig)

get_max_seq_len()​

get_max_seq_len()

source

Returns the default maximum sequence length for the model.

Subclasses should determine whether this value can be overridden by setting the --max-length (pipeline_config.model.max_length) flag.

Return type:

int

initialize()​

classmethod initialize(pipeline_config, model_config=None)

source

Initialize the config from a PipelineConfig.

Parameters:

  • pipeline_config (PipelineConfig) – The pipeline configuration.
  • model_config (MAXModelConfig | None) – The model configuration to read from. When None (the default), pipeline_config.model is used. Pass an explicit config (e.g. pipeline_config.draft_model) to initialize the arch config for a different model.

Return type:

Self

pipeline_config​

pipeline_config: PipelineConfig

source

ZImageConfig​

class max.pipelines.architectures.z_image_modulev3.ZImageConfig(*, config_file=None, section_name=None, all_patch_size=(2, ), all_f_patch_size=(1, ), in_channels=16, dim=3840, n_layers=30, n_refiner_layers=2, n_heads=30, n_kv_heads=30, norm_eps=1e-05, qk_norm=True, cap_feat_dim=2560, rope_theta=256.0, t_scale=1000.0, axes_dims=(32, 48, 48), axes_lens=(1024, 512, 512), dtype=bfloat16, device=<factory>)

source

Bases: MAXModelConfigBase

Parameters:

all_f_patch_size​

all_f_patch_size: tuple[int, ...]

source

all_patch_size​

all_patch_size: tuple[int, ...]

source

axes_dims​

axes_dims: tuple[int, ...]

source

axes_lens​

axes_lens: tuple[int, ...]

source

cap_feat_dim​

cap_feat_dim: int

source

device​

device: DeviceRef

source

dim​

dim: int

source

dtype​

dtype: DType

source

fbcache_dims()​

fbcache_dims()

source

(hidden_dim, output_dim) per image token for FBCache / Taylor tensors.

Return type:

tuple[int, int]

in_channels​

in_channels: int

source

initialize_from_config()​

classmethod initialize_from_config(config_dict, encoding, devices)

source

Parameters:

  • config_dict (dict[str, Any])
  • encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'])
  • devices (list[Device])

Return type:

Self

model_config​

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_heads​

n_heads: int

source

n_kv_heads​

n_kv_heads: int

source

n_layers​

n_layers: int

source

n_refiner_layers​

n_refiner_layers: int

source

norm_eps​

norm_eps: float

source

qk_norm​

qk_norm: bool

source

rope_theta​

rope_theta: float

source

t_scale​

t_scale: float

source

ZImageTransformerModel​

class max.pipelines.architectures.z_image_modulev3.ZImageTransformerModel(config, encoding, devices, weights, *, cache_config=None)

source

Bases: ComponentModel

Component wrapper for the compiled Z-Image transformer graph.

Parameters:

load_model()​

load_model()

source

Load and return a runtime model instance.

Return type:

None

model​

model: Callable[[...], Any]

source