Python class

PixelModelInputs

`PixelModelInputs`

class max.pipelines.lib.interfaces.PixelModelInputs(*, tokens, tokens_2=None, negative_tokens=None, negative_tokens_2=None, timesteps=<factory>, sigmas=<factory>, latents=<factory>, latent_image_ids=<factory>, height=1024, width=1024, num_inference_steps=50, guidance_scale=3.5, guidance=None, true_cfg_scale=1.0, num_warmup_steps=0, num_images_per_prompt=1, input_image=None)

source

Bases: object

A common input container for pixel-generation models.

This dataclass is designed to provide a consistent set of fields used across multiple pixel pipelines/models.

Parameters:

tokens (TokenBuffer)
tokens_2 (TokenBuffer | None)
negative_tokens (TokenBuffer | None)
negative_tokens_2 (TokenBuffer | None)
timesteps (ndarray[tuple[Any, ...], dtype[float32]])
sigmas (ndarray[tuple[Any, ...], dtype[float32]])
latents (ndarray[tuple[Any, ...], dtype[float32]])
latent_image_ids (ndarray[tuple[Any, ...], dtype[float32]])
height (int)
width (int)
num_inference_steps (int)
guidance_scale (float)
guidance (ndarray[tuple[Any, ...], dtype[float32]] | None)
true_cfg_scale (float)
num_warmup_steps (int)
num_images_per_prompt (int)
input_image (Image | None)

`from_context()`

classmethod from_context(context)

source

Build an instance from a context-like dict.

Policy:

If a key is missing: the dataclass default applies automatically.
If a key is present with value None: treat as missing and substitute the class default (including subclass overrides).

Parameters:: context (PixelGenerationContext)
Return type:: Self

`guidance`

guidance: ndarray[tuple[Any, ...], dtype[float32]] | None = None

source

Optional guidance tensor.

Some pipelines precompute guidance weights/tensors (e.g., per-token weights, per-step weights).
None is meaningful here: it means “no explicit guidance tensor supplied”.
Unlike scalar fields, None is preserved (not replaced).

`guidance_scale`

guidance_scale: float = 3.5

source

Guidance scale for classifier-free guidance (CFG).

A higher value typically increases adherence to the prompt but can reduce diversity.
This is expected to be a real float (not None).
If a context provides guidance_scale=None, from_context() substitutes the default.

`height`

height: int = 1024

source

Output height in pixels.

This is a required scalar (not None).
If a context provides height=None, from_context() treats that as “not provided” and substitutes this default value (or a subclass override).

`input_image`

input_image: Image | None = None

source

Optional input image for image-to-image generation (PIL.Image.Image).

`latent_image_ids`

latent_image_ids: ndarray[tuple[Any, ...], dtype[float32]]

source

Optional latent image IDs / positional identifiers for latents.

Some pipelines attach per-latent identifiers for caching, routing, or conditioning.
Often used to avoid recomputation of image-id embeddings across steps.
If unused, it may remain empty.

`latents`

latents: ndarray[tuple[Any, ...], dtype[float32]]

source

Initial latent noise tensor (or initial latent state).

For diffusion/flow models, this is typically random noise seeded per request.
Shape depends on model: commonly [B, C, H/8, W/8] for image latents, or [B, T, C, H/8, W/8] for video latents.
If your pipeline generates latents internally, you may leave it empty. (Model-specific subclasses can enforce non-empty via __post_init__.)

`negative_tokens`

negative_tokens: TokenBuffer | None = None

source

Negative prompt tokens for the primary encoder. Used for classifier-free guidance (CFG) or similar conditioning schemes. If your pipeline does not use negative prompts, leave as None.

`negative_tokens_2`

negative_tokens_2: TokenBuffer | None = None

source

Negative prompt tokens for the secondary encoder (for dual-encoder models). If the model is single-encoder or you do not use negative prompts, leave as None.

`num_images_per_prompt`

num_images_per_prompt: int = 1

source

Number of images/videos to generate per prompt.

Commonly used for “same prompt, multiple samples” behavior.
Must be > 0.
For video generation, the naming may still be used for historical compatibility.

`num_inference_steps`

num_inference_steps: int = 50

source

Number of denoising/inference steps.

This is a required scalar (not None).
If a context provides num_inference_steps=None, from_context() treats that as “not provided” and substitutes this default value (or a subclass override).

`num_warmup_steps`

num_warmup_steps: int = 0

source

Number of warmup steps.

Used in some schedulers/pipelines to handle initial steps differently (e.g., scheduler stabilization, cache warmup, etc.).
Must be >= 0.

`sigmas`

sigmas: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed sigma schedule for denoising.

Usually a 1D float32 numpy array of length num_inference_steps corresponding to the noise level per step.
Some schedulers are sigma-based; others are timestep-based; some use both.
If unused, it may remain empty unless your model subclass requires it.

`timesteps`

timesteps: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed denoising timestep schedule.

Usually a 1D float32 numpy array of length num_inference_steps (exact semantics depend on your scheduler).
If your pipeline precomputes the scheduler trajectory, you pass it here.
Some models may not require explicit timesteps; in that case it may remain empty. (Model-specific subclasses can enforce non-empty via __post_init__.)

`tokens`

tokens: TokenBuffer

source

Primary encoder token buffer. This is the main prompt representation consumed by the model’s text encoder. Required for all models.

`tokens_2`

tokens_2: TokenBuffer | None = None

source

Secondary encoder token buffer (for dual-encoder models). Examples: architectures that have a second text encoder stream or pooled embeddings. If the model is single-encoder, leave as None.

`true_cfg_scale`

true_cfg_scale: float = 1.0

source

“True CFG” scale used by certain pipelines/models.

Some architectures distinguish between the user-facing guidance_scale and an internal scale applied to a different normalization or conditioning pathway.
Defaults to 1.0 for pipelines that do not use this feature.

`width`

width: int = 1024

source

Output width in pixels.

This is a required scalar (not None).
If a context provides width=None, from_context() treats that as “not provided” and substitutes this default value (or a subclass override).

PixelModelInputs​

from_context()​

guidance​

guidance_scale​

height​

input_image​

latent_image_ids​

latents​

negative_tokens​

negative_tokens_2​

num_images_per_prompt​

num_inference_steps​

num_warmup_steps​

sigmas​

timesteps​

tokens​

tokens_2​

true_cfg_scale​

width​