Skip to main content

Python class

PixelModelInputs

PixelModelInputs

class max.pipelines.lib.interfaces.PixelModelInputs(*, tokens, tokens_2=None, negative_tokens=None, negative_tokens_2=None, timesteps=<factory>, sigmas=<factory>, latents=<factory>, latent_image_ids=<factory>, height=1024, width=1024, num_inference_steps=50, guidance_scale=3.5, guidance=None, true_cfg_scale=1.0, num_warmup_steps=0, num_images_per_prompt=1, input_image=None)

source

Bases: object

A common input container for pixel-generation models.

This dataclass is designed to provide a consistent set of fields used across multiple pixel pipelines/models.

Parameters:

from_context()

classmethod from_context(context)

source

Build an instance from a context-like dict.

Policy:

  • If a key is missing: the dataclass default applies automatically.
  • If a key is present with value None: treat as missing and substitute the class default (including subclass overrides).

Parameters:

context (PixelGenerationContext)

Return type:

Self

guidance

guidance: ndarray[tuple[Any, ...], dtype[float32]] | None = None

source

Optional guidance tensor.

  • Some pipelines precompute guidance weights/tensors (e.g., per-token weights, per-step weights).
  • None is meaningful here: it means “no explicit guidance tensor supplied”.
  • Unlike scalar fields, None is preserved (not replaced).

guidance_scale

guidance_scale: float = 3.5

source

Guidance scale for classifier-free guidance (CFG).

  • A higher value typically increases adherence to the prompt but can reduce diversity.
  • This is expected to be a real float (not None).
  • If a context provides guidance_scale=None, from_context() substitutes the default.

height

height: int = 1024

source

Output height in pixels.

  • This is a required scalar (not None).
  • If a context provides height=None, from_context() treats that as “not provided” and substitutes this default value (or a subclass override).

input_image

input_image: Image | None = None

source

Optional input image for image-to-image generation (PIL.Image.Image).

latent_image_ids

latent_image_ids: ndarray[tuple[Any, ...], dtype[float32]]

source

Optional latent image IDs / positional identifiers for latents.

  • Some pipelines attach per-latent identifiers for caching, routing, or conditioning.
  • Often used to avoid recomputation of image-id embeddings across steps.
  • If unused, it may remain empty.

latents

latents: ndarray[tuple[Any, ...], dtype[float32]]

source

Initial latent noise tensor (or initial latent state).

  • For diffusion/flow models, this is typically random noise seeded per request.
  • Shape depends on model: commonly [B, C, H/8, W/8] for image latents, or [B, T, C, H/8, W/8] for video latents.
  • If your pipeline generates latents internally, you may leave it empty. (Model-specific subclasses can enforce non-empty via __post_init__.)

negative_tokens

negative_tokens: TokenBuffer | None = None

source

Negative prompt tokens for the primary encoder. Used for classifier-free guidance (CFG) or similar conditioning schemes. If your pipeline does not use negative prompts, leave as None.

negative_tokens_2

negative_tokens_2: TokenBuffer | None = None

source

Negative prompt tokens for the secondary encoder (for dual-encoder models). If the model is single-encoder or you do not use negative prompts, leave as None.

num_images_per_prompt

num_images_per_prompt: int = 1

source

Number of images/videos to generate per prompt.

  • Commonly used for “same prompt, multiple samples” behavior.
  • Must be > 0.
  • For video generation, the naming may still be used for historical compatibility.

num_inference_steps

num_inference_steps: int = 50

source

Number of denoising/inference steps.

  • This is a required scalar (not None).
  • If a context provides num_inference_steps=None, from_context() treats that as “not provided” and substitutes this default value (or a subclass override).

num_warmup_steps

num_warmup_steps: int = 0

source

Number of warmup steps.

  • Used in some schedulers/pipelines to handle initial steps differently (e.g., scheduler stabilization, cache warmup, etc.).
  • Must be >= 0.

sigmas

sigmas: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed sigma schedule for denoising.

  • Usually a 1D float32 numpy array of length num_inference_steps corresponding to the noise level per step.
  • Some schedulers are sigma-based; others are timestep-based; some use both.
  • If unused, it may remain empty unless your model subclass requires it.

timesteps

timesteps: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed denoising timestep schedule.

  • Usually a 1D float32 numpy array of length num_inference_steps (exact semantics depend on your scheduler).
  • If your pipeline precomputes the scheduler trajectory, you pass it here.
  • Some models may not require explicit timesteps; in that case it may remain empty. (Model-specific subclasses can enforce non-empty via __post_init__.)

tokens

tokens: TokenBuffer

source

Primary encoder token buffer. This is the main prompt representation consumed by the model’s text encoder. Required for all models.

tokens_2

tokens_2: TokenBuffer | None = None

source

Secondary encoder token buffer (for dual-encoder models). Examples: architectures that have a second text encoder stream or pooled embeddings. If the model is single-encoder, leave as None.

true_cfg_scale

true_cfg_scale: float = 1.0

source

“True CFG” scale used by certain pipelines/models.

  • Some architectures distinguish between the user-facing guidance_scale and an internal scale applied to a different normalization or conditioning pathway.
  • Defaults to 1.0 for pipelines that do not use this feature.

width

width: int = 1024

source

Output width in pixels.

  • This is a required scalar (not None).
  • If a context provides width=None, from_context() treats that as “not provided” and substitutes this default value (or a subclass override).