Python class

PixelContext

`PixelContext`

class max.pipelines.PixelContext(*, tokens, request_id=<factory>, model_name='', mask=None, tokens_2=None, negative_tokens=None, negative_mask=None, negative_tokens_2=None, explicit_negative_prompt=False, timesteps=<factory>, sigmas=<factory>, latents=<factory>, latent_image_ids=<factory>, text_ids=<factory>, negative_text_ids=<factory>, height=1024, width=1024, num_inference_steps=50, guidance_scale=3.5, true_cfg_scale=1.0, strength=0.6, cfg_normalization=False, cfg_truncation=1.0, num_warmup_steps=0, num_images_per_prompt=1, input_image=None, input_images=None, prompt_images=None, vae_condition_images=None, output_format='jpeg', residual_threshold=None, status=GenerationStatus.ACTIVE)

source

Bases: object

A model-ready context for image/video generation requests.

Per the design doc, this class contains only numeric data that the model will execute against. User-facing strings (prompt, negative_prompt) are consumed during tokenization and do not appear here.

All preprocessing is performed by PixelGenerationTokenizer.new_context():

Prompt tokenization -> tokens field
Negative prompt tokenization -> negative_tokens field
Timestep schedule computation -> timesteps field
Initial noise generation -> latents field

Parameters:

tokens (TokenBuffer) – Tokenized prompt IDs (TokenBuffer).
request_id (RequestID) – A unique identifier for this generation request.
model_name (str) – Name of the model being used.
mask (ndarray[tuple[Any, ...], dtype[bool]] | None)
tokens_2 (TokenBuffer | None)
negative_tokens (TokenBuffer | None) – Tokenized negative prompt IDs (TokenBuffer).
negative_mask (ndarray[tuple[Any, ...], dtype[bool]] | None)
negative_tokens_2 (TokenBuffer | None)
explicit_negative_prompt (bool)
timesteps (ndarray[tuple[Any, ...], dtype[float32]]) – Precomputed timestep schedule for denoising.
sigmas (ndarray[tuple[Any, ...], dtype[float32]])
latents (ndarray[tuple[Any, ...], dtype[float32]]) – Precomputed initial noise (latents).
latent_image_ids (ndarray[tuple[Any, ...], dtype[float32]])
text_ids (ndarray[tuple[Any, ...], dtype[int64]])
negative_text_ids (ndarray[tuple[Any, ...], dtype[int64]])
height (int) – Height of the generated image/video in pixels.
width (int) – Width of the generated image/video in pixels.
num_inference_steps (int) – Number of denoising steps.
guidance_scale (float) – Guidance scale for classifier-free guidance.
true_cfg_scale (float)
strength (float)
cfg_normalization (bool)
cfg_truncation (float)
num_warmup_steps (int)
num_images_per_prompt (int) – Number of images/videos to generate per prompt.
input_image (ndarray[tuple[Any, ...], dtype[uint8]] | None) – Optional HWC uint8 numpy array for image-to-image generation.
input_images (list[ndarray[tuple[Any, ...], dtype[uint8]]] | None) – Optional list of input images for image-to-image generation.
prompt_images (list[ndarray[tuple[Any, ...], dtype[uint8]]] | None)
vae_condition_images (list[ndarray[tuple[Any, ...], dtype[uint8]]] | None)
output_format (str)
residual_threshold (float | None)
status (GenerationStatus)

`cfg_normalization`

cfg_normalization: bool = False

source

`cfg_truncation`

cfg_truncation: float = 1.0

source

`compute_num_available_steps()`

compute_num_available_steps(max_seq_len)

source

Compute number of available steps for scheduler compatibility.

For image and video generation, this returns the number of inference steps.

Parameters:: max_seq_len (int)
Return type:: int

`explicit_negative_prompt`

explicit_negative_prompt: bool = False

source

Whether the request explicitly supplied a negative prompt.

`guidance_scale`

guidance_scale: float = 3.5

source

`height`

height: int = 1024

source

`input_image`

input_image: ndarray[tuple[Any, ...], dtype[uint8]] | None = None

source

Input image as numpy array (H, W, C) in uint8 format for image-to-image generation.

`input_images`

input_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None

source

Input images as list of numpy arrays (H, W, C) in uint8 format for image-to-image generation.

`is_done`

property is_done: bool

source

Whether the request has completed generation.

`latent_image_ids`

latent_image_ids: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed latent image IDs for generation.

`latents`

latents: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed initial noise (latents) for generation.

`mask`

mask: ndarray[tuple[Any, ...], dtype[bool]] | None = None

source

Mask for text encoder’s attention.

`model_name`

model_name: str = ''

source

`negative_mask`

negative_mask: ndarray[tuple[Any, ...], dtype[bool]] | None = None

source

Mask for the negative text encoder path.

`negative_text_ids`

negative_text_ids: ndarray[tuple[Any, ...], dtype[int64]]

source

Precomputed text position IDs for the negative prompt.

`negative_tokens`

negative_tokens: TokenBuffer | None = None

source

Negative tokens for primary encoder.

`negative_tokens_2`

negative_tokens_2: TokenBuffer | None = None

source

Negative tokens for secondary encoder. None for single-encoder models.

`num_images_per_prompt`

num_images_per_prompt: int = 1

source

`num_inference_steps`

num_inference_steps: int = 50

source

`num_warmup_steps`

num_warmup_steps: int = 0

source

`output_format`

output_format: str = 'jpeg'

source

Image encoding format for the output (e.g., ‘jpeg’, ‘png’, ‘webp’).

`prompt_images`

prompt_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None

source

Optional prompt-conditioning images prepared by the tokenizer.

`request_id`

request_id: RequestID

source

`reset()`

reset()

source

Resets the context’s state.

Return type:: None

`residual_threshold`

residual_threshold: float | None = None

source

Per-request residual threshold for FBCache. None uses pipeline default.

`sigmas`

sigmas: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed sigmas schedule for denoising.

`status`

status: GenerationStatus = 'active'

source

`strength`

strength: float = 0.6

source

`text_ids`

text_ids: ndarray[tuple[Any, ...], dtype[int64]]

source

Precomputed text position IDs, shape (B, seq_len, 4) int64.

`timesteps`

timesteps: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed timesteps schedule for denoising.

`to_generation_output()`

to_generation_output()

source

Convert this context to a GenerationOutput object.

Return type:: GenerationOutput

`tokens`

tokens: TokenBuffer

source

Primary encoder tokens.

`tokens_2`

tokens_2: TokenBuffer | None = None

source

Secondary encoder tokens. None for single-encoder models.

`true_cfg_scale`

true_cfg_scale: float = 1.0

source

`update()`

update(latents)

source

Update the context with newly generated latents/image data.

Parameters:: latents (ndarray[tuple[Any, ...], dtype[Any]])
Return type:: None

`vae_condition_images`

vae_condition_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None

source

Optional VAE-conditioning images prepared by the tokenizer.

Qwen image edit keeps prompt-conditioning images and VAE-conditioning images separate because the multimodal prompt encoder and the VAE latent conditioning path use different resize targets.

`width`

width: int = 1024

source

PixelContext​

cfg_normalization​

cfg_truncation​

compute_num_available_steps()​

explicit_negative_prompt​

guidance_scale​

height​

input_image​

input_images​

is_done​

latent_image_ids​

latents​

mask​

model_name​

negative_mask​

negative_text_ids​

negative_tokens​

negative_tokens_2​

num_images_per_prompt​

num_inference_steps​

num_warmup_steps​

output_format​

prompt_images​

request_id​

reset()​

residual_threshold​

sigmas​

status​

strength​

text_ids​

timesteps​

to_generation_output()​

tokens​

tokens_2​

true_cfg_scale​

update()​

vae_condition_images​

width​