Python class
PixelContext
PixelContext
class max.pipelines.PixelContext(*, tokens, request_id=<factory>, model_name='', mask=None, tokens_2=None, negative_tokens=None, negative_mask=None, negative_tokens_2=None, explicit_negative_prompt=False, timesteps=<factory>, sigmas=<factory>, latents=<factory>, latent_image_ids=<factory>, text_ids=<factory>, negative_text_ids=<factory>, height=1024, width=1024, num_inference_steps=50, guidance_scale=3.5, true_cfg_scale=1.0, strength=0.6, cfg_normalization=False, cfg_truncation=1.0, num_warmup_steps=0, num_images_per_prompt=1, input_image=None, input_images=None, prompt_images=None, vae_condition_images=None, output_format='jpeg', residual_threshold=None, status=GenerationStatus.ACTIVE)
Bases: object
A model-ready context for image/video generation requests.
Per the design doc, this class contains only numeric data that the model will execute against. User-facing strings (prompt, negative_prompt) are consumed during tokenization and do not appear here.
All preprocessing is performed by PixelGenerationTokenizer.new_context():
- Prompt tokenization -> tokens field
- Negative prompt tokenization -> negative_tokens field
- Timestep schedule computation -> timesteps field
- Initial noise generation -> latents field
-
Parameters:
-
- tokens (TokenBuffer) – Tokenized prompt IDs (TokenBuffer).
- request_id (RequestID) – A unique identifier for this generation request.
- model_name (str) – Name of the model being used.
- mask (ndarray[tuple[Any, ...], dtype[bool]] | None)
- tokens_2 (TokenBuffer | None)
- negative_tokens (TokenBuffer | None) – Tokenized negative prompt IDs (TokenBuffer).
- negative_mask (ndarray[tuple[Any, ...], dtype[bool]] | None)
- negative_tokens_2 (TokenBuffer | None)
- explicit_negative_prompt (bool)
- timesteps (ndarray[tuple[Any, ...], dtype[float32]]) – Precomputed timestep schedule for denoising.
- sigmas (ndarray[tuple[Any, ...], dtype[float32]])
- latents (ndarray[tuple[Any, ...], dtype[float32]]) – Precomputed initial noise (latents).
- latent_image_ids (ndarray[tuple[Any, ...], dtype[float32]])
- text_ids (ndarray[tuple[Any, ...], dtype[int64]])
- negative_text_ids (ndarray[tuple[Any, ...], dtype[int64]])
- height (int) – Height of the generated image/video in pixels.
- width (int) – Width of the generated image/video in pixels.
- num_inference_steps (int) – Number of denoising steps.
- guidance_scale (float) – Guidance scale for classifier-free guidance.
- true_cfg_scale (float)
- strength (float)
- cfg_normalization (bool)
- cfg_truncation (float)
- num_warmup_steps (int)
- num_images_per_prompt (int) – Number of images/videos to generate per prompt.
- input_image (ndarray[tuple[Any, ...], dtype[uint8]] | None) – Optional HWC uint8 numpy array for image-to-image generation.
- input_images (list[ndarray[tuple[Any, ...], dtype[uint8]]] | None) – Optional list of input images for image-to-image generation.
- prompt_images (list[ndarray[tuple[Any, ...], dtype[uint8]]] | None)
- vae_condition_images (list[ndarray[tuple[Any, ...], dtype[uint8]]] | None)
- output_format (str)
- residual_threshold (float | None)
- status (GenerationStatus)
cfg_normalization
cfg_normalization: bool = False
cfg_truncation
cfg_truncation: float = 1.0
compute_num_available_steps()
compute_num_available_steps(max_seq_len)
Compute number of available steps for scheduler compatibility.
For image and video generation, this returns the number of inference steps.
explicit_negative_prompt
explicit_negative_prompt: bool = False
Whether the request explicitly supplied a negative prompt.
guidance_scale
guidance_scale: float = 3.5
height
height: int = 1024
input_image
input_image: ndarray[tuple[Any, ...], dtype[uint8]] | None = None
Input image as numpy array (H, W, C) in uint8 format for image-to-image generation.
input_images
input_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None
Input images as list of numpy arrays (H, W, C) in uint8 format for image-to-image generation.
is_done
property is_done: bool
Whether the request has completed generation.
latent_image_ids
Precomputed latent image IDs for generation.
latents
Precomputed initial noise (latents) for generation.
mask
Mask for text encoder’s attention.
model_name
model_name: str = ''
negative_mask
negative_mask: ndarray[tuple[Any, ...], dtype[bool]] | None = None
Mask for the negative text encoder path.
negative_text_ids
Precomputed text position IDs for the negative prompt.
negative_tokens
negative_tokens: TokenBuffer | None = None
Negative tokens for primary encoder.
negative_tokens_2
negative_tokens_2: TokenBuffer | None = None
Negative tokens for secondary encoder. None for single-encoder models.
num_images_per_prompt
num_images_per_prompt: int = 1
num_inference_steps
num_inference_steps: int = 50
num_warmup_steps
num_warmup_steps: int = 0
output_format
output_format: str = 'jpeg'
Image encoding format for the output (e.g., ‘jpeg’, ‘png’, ‘webp’).
prompt_images
prompt_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None
Optional prompt-conditioning images prepared by the tokenizer.
request_id
request_id: RequestID
reset()
reset()
Resets the context’s state.
-
Return type:
-
None
residual_threshold
Per-request residual threshold for FBCache. None uses pipeline default.
sigmas
Precomputed sigmas schedule for denoising.
status
status: GenerationStatus = 'active'
strength
strength: float = 0.6
text_ids
Precomputed text position IDs, shape (B, seq_len, 4) int64.
timesteps
Precomputed timesteps schedule for denoising.
to_generation_output()
to_generation_output()
Convert this context to a GenerationOutput object.
-
Return type:
tokens
tokens: TokenBuffer
Primary encoder tokens.
tokens_2
tokens_2: TokenBuffer | None = None
Secondary encoder tokens. None for single-encoder models.
true_cfg_scale
true_cfg_scale: float = 1.0
update()
update(latents)
Update the context with newly generated latents/image data.
vae_condition_images
vae_condition_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None
Optional VAE-conditioning images prepared by the tokenizer.
Qwen image edit keeps prompt-conditioning images and VAE-conditioning images separate because the multimodal prompt encoder and the VAE latent conditioning path use different resize targets.
width
width: int = 1024
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!