Python class
PixelModelInputs
PixelModelInputs
class max.pipelines.lib.interfaces.PixelModelInputs(*, tokens, tokens_2=None, negative_tokens=None, negative_tokens_2=None, timesteps=<factory>, sigmas=<factory>, latents=<factory>, latent_image_ids=<factory>, height=1024, width=1024, num_inference_steps=50, guidance_scale=3.5, guidance=None, true_cfg_scale=1.0, num_warmup_steps=0, num_images_per_prompt=1, input_image=None)
Bases: object
A common input container for pixel-generation models.
This dataclass is designed to provide a consistent set of fields used across multiple pixel pipelines/models.
-
Parameters:
-
- tokens (TokenBuffer)
- tokens_2 (TokenBuffer | None)
- negative_tokens (TokenBuffer | None)
- negative_tokens_2 (TokenBuffer | None)
- timesteps (ndarray[tuple[Any, ...], dtype[float32]])
- sigmas (ndarray[tuple[Any, ...], dtype[float32]])
- latents (ndarray[tuple[Any, ...], dtype[float32]])
- latent_image_ids (ndarray[tuple[Any, ...], dtype[float32]])
- height (int)
- width (int)
- num_inference_steps (int)
- guidance_scale (float)
- guidance (ndarray[tuple[Any, ...], dtype[float32]] | None)
- true_cfg_scale (float)
- num_warmup_steps (int)
- num_images_per_prompt (int)
- input_image (Image | None)
from_context()
classmethod from_context(context)
Build an instance from a context-like dict.
Policy:
- If a key is missing: the dataclass default applies automatically.
- If a key is present with value None: treat as missing and substitute the class default (including subclass overrides).
-
Parameters:
-
context (PixelGenerationContext)
-
Return type:
guidance
guidance: ndarray[tuple[Any, ...], dtype[float32]] | None = None
Optional guidance tensor.
- Some pipelines precompute guidance weights/tensors (e.g., per-token weights, per-step weights).
- None is meaningful here: it means “no explicit guidance tensor supplied”.
- Unlike scalar fields, None is preserved (not replaced).
guidance_scale
guidance_scale: float = 3.5
Guidance scale for classifier-free guidance (CFG).
- A higher value typically increases adherence to the prompt but can reduce diversity.
- This is expected to be a real float (not None).
- If a context provides guidance_scale=None, from_context() substitutes the default.
height
height: int = 1024
Output height in pixels.
- This is a required scalar (not None).
- If a context provides height=None, from_context() treats that as “not provided” and substitutes this default value (or a subclass override).
input_image
input_image: Image | None = None
Optional input image for image-to-image generation (PIL.Image.Image).
latent_image_ids
Optional latent image IDs / positional identifiers for latents.
- Some pipelines attach per-latent identifiers for caching, routing, or conditioning.
- Often used to avoid recomputation of image-id embeddings across steps.
- If unused, it may remain empty.
latents
Initial latent noise tensor (or initial latent state).
- For diffusion/flow models, this is typically random noise seeded per request.
- Shape depends on model: commonly [B, C, H/8, W/8] for image latents, or [B, T, C, H/8, W/8] for video latents.
- If your pipeline generates latents internally, you may leave it empty. (Model-specific subclasses can enforce non-empty via __post_init__.)
negative_tokens
negative_tokens: TokenBuffer | None = None
Negative prompt tokens for the primary encoder. Used for classifier-free guidance (CFG) or similar conditioning schemes. If your pipeline does not use negative prompts, leave as None.
negative_tokens_2
negative_tokens_2: TokenBuffer | None = None
Negative prompt tokens for the secondary encoder (for dual-encoder models). If the model is single-encoder or you do not use negative prompts, leave as None.
num_images_per_prompt
num_images_per_prompt: int = 1
Number of images/videos to generate per prompt.
- Commonly used for “same prompt, multiple samples” behavior.
- Must be > 0.
- For video generation, the naming may still be used for historical compatibility.
num_inference_steps
num_inference_steps: int = 50
Number of denoising/inference steps.
- This is a required scalar (not None).
- If a context provides num_inference_steps=None, from_context() treats that as “not provided” and substitutes this default value (or a subclass override).
num_warmup_steps
num_warmup_steps: int = 0
Number of warmup steps.
- Used in some schedulers/pipelines to handle initial steps differently (e.g., scheduler stabilization, cache warmup, etc.).
- Must be >= 0.
sigmas
Precomputed sigma schedule for denoising.
- Usually a 1D float32 numpy array of length num_inference_steps corresponding to the noise level per step.
- Some schedulers are sigma-based; others are timestep-based; some use both.
- If unused, it may remain empty unless your model subclass requires it.
timesteps
Precomputed denoising timestep schedule.
- Usually a 1D float32 numpy array of length num_inference_steps (exact semantics depend on your scheduler).
- If your pipeline precomputes the scheduler trajectory, you pass it here.
- Some models may not require explicit timesteps; in that case it may remain empty. (Model-specific subclasses can enforce non-empty via __post_init__.)
tokens
tokens: TokenBuffer
Primary encoder token buffer. This is the main prompt representation consumed by the model’s text encoder. Required for all models.
tokens_2
tokens_2: TokenBuffer | None = None
Secondary encoder token buffer (for dual-encoder models). Examples: architectures that have a second text encoder stream or pooled embeddings. If the model is single-encoder, leave as None.
true_cfg_scale
true_cfg_scale: float = 1.0
“True CFG” scale used by certain pipelines/models.
- Some architectures distinguish between the user-facing guidance_scale and an internal scale applied to a different normalization or conditioning pathway.
- Defaults to 1.0 for pipelines that do not use this feature.
width
width: int = 1024
Output width in pixels.
- This is a required scalar (not None).
- If a context provides width=None, from_context() treats that as “not provided” and substitutes this default value (or a subclass override).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!