Python class
PixelContext
PixelContextβ
class max.pipelines.PixelContext(*, tokens, request_id=<factory>, model_name='', mask=None, tokens_2=None, negative_tokens=None, negative_mask=None, negative_tokens_2=None, explicit_negative_prompt=False, timesteps=<factory>, sigmas=<factory>, latents=<factory>, latent_image_ids=<factory>, text_ids=<factory>, negative_text_ids=<factory>, height=1024, width=1024, num_inference_steps=50, guidance_scale=3.5, true_cfg_scale=1.0, strength=0.6, cfg_normalization=False, cfg_truncation=1.0, num_warmup_steps=0, num_images_per_prompt=1, input_image=None, input_images=None, prompt_images=None, vae_condition_images=None, output_format='jpeg', residual_threshold=None, status=GenerationStatus.ACTIVE)
Bases: object
A model-ready context for image/video generation requests.
Per the design doc, this class contains only numeric data that the model will execute against. User-facing strings (prompt, negative_prompt) are consumed during tokenization and do not appear here.
All preprocessing is performed by PixelGenerationTokenizer.new_context():
- Prompt tokenization -> tokens field
- Negative prompt tokenization -> negative_tokens field
- Timestep schedule computation -> timesteps field
- Initial noise generation -> latents field
-
Parameters:
-
- tokens (TokenBuffer) β Tokenized prompt IDs (TokenBuffer).
- request_id (RequestID) β A unique identifier for this generation request.
- model_name (str) β Name of the model being used.
- mask (ndarray[tuple[Any, ...], dtype[bool]] | None)
- tokens_2 (TokenBuffer | None)
- negative_tokens (TokenBuffer | None) β Tokenized negative prompt IDs (TokenBuffer).
- negative_mask (ndarray[tuple[Any, ...], dtype[bool]] | None)
- negative_tokens_2 (TokenBuffer | None)
- explicit_negative_prompt (bool)
- timesteps (ndarray[tuple[Any, ...], dtype[float32]]) β Precomputed timestep schedule for denoising.
- sigmas (ndarray[tuple[Any, ...], dtype[float32]])
- latents (ndarray[tuple[Any, ...], dtype[float32]]) β Precomputed initial noise (latents).
- latent_image_ids (ndarray[tuple[Any, ...], dtype[float32]])
- text_ids (ndarray[tuple[Any, ...], dtype[int64]])
- negative_text_ids (ndarray[tuple[Any, ...], dtype[int64]])
- height (int) β Height of the generated image/video in pixels.
- width (int) β Width of the generated image/video in pixels.
- num_inference_steps (int) β Number of denoising steps.
- guidance_scale (float) β Guidance scale for classifier-free guidance.
- true_cfg_scale (float)
- strength (float)
- cfg_normalization (bool)
- cfg_truncation (float)
- num_warmup_steps (int)
- num_images_per_prompt (int) β Number of images/videos to generate per prompt.
- input_image (ndarray[tuple[Any, ...], dtype[uint8]] | None) β Optional HWC uint8 numpy array for image-to-image generation.
- input_images (list[ndarray[tuple[Any, ...], dtype[uint8]]] | None) β Optional list of input images for image-to-image generation.
- prompt_images (list[ndarray[tuple[Any, ...], dtype[uint8]]] | None)
- vae_condition_images (list[ndarray[tuple[Any, ...], dtype[uint8]]] | None)
- output_format (str)
- residual_threshold (float | None)
- status (GenerationStatus)
cfg_normalizationβ
cfg_normalization: bool = False
cfg_truncationβ
cfg_truncation: float = 1.0
compute_num_available_steps()β
compute_num_available_steps(max_seq_len)
Compute number of available steps for scheduler compatibility.
For image and video generation, this returns the number of inference steps.
explicit_negative_promptβ
explicit_negative_prompt: bool = False
Whether the request explicitly supplied a negative prompt.
guidance_scaleβ
guidance_scale: float = 3.5
heightβ
height: int = 1024
input_imageβ
input_image: ndarray[tuple[Any, ...], dtype[uint8]] | None = None
Input image as numpy array (H, W, C) in uint8 format for image-to-image generation.
input_imagesβ
input_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None
Input images as list of numpy arrays (H, W, C) in uint8 format for image-to-image generation.
is_doneβ
property is_done: bool
Whether the request has completed generation.
latent_image_idsβ
Precomputed latent image IDs for generation.
latentsβ
Precomputed initial noise (latents) for generation.
maskβ
Mask for text encoderβs attention.
model_nameβ
model_name: str = ''
negative_maskβ
negative_mask: ndarray[tuple[Any, ...], dtype[bool]] | None = None
Mask for the negative text encoder path.
negative_text_idsβ
Precomputed text position IDs for the negative prompt.
negative_tokensβ
negative_tokens: TokenBuffer | None = None
Negative tokens for primary encoder.
negative_tokens_2β
negative_tokens_2: TokenBuffer | None = None
Negative tokens for secondary encoder. None for single-encoder models.
num_images_per_promptβ
num_images_per_prompt: int = 1
num_inference_stepsβ
num_inference_steps: int = 50
num_warmup_stepsβ
num_warmup_steps: int = 0
output_formatβ
output_format: str = 'jpeg'
Image encoding format for the output (e.g., βjpegβ, βpngβ, βwebpβ).
prompt_imagesβ
prompt_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None
Optional prompt-conditioning images prepared by the tokenizer.
request_idβ
request_id: RequestID
reset()β
reset()
Resets the contextβs state.
-
Return type:
-
None
residual_thresholdβ
Per-request residual threshold for FBCache. None uses pipeline default.
sigmasβ
Precomputed sigmas schedule for denoising.
statusβ
status: GenerationStatus = 'active'
strengthβ
strength: float = 0.6
text_idsβ
Precomputed text position IDs, shape (B, seq_len, 4) int64.
timestepsβ
Precomputed timesteps schedule for denoising.
to_generation_output()β
to_generation_output()
Convert this context to a GenerationOutput object.
-
Return type:
tokensβ
tokens: TokenBuffer
Primary encoder tokens.
tokens_2β
tokens_2: TokenBuffer | None = None
Secondary encoder tokens. None for single-encoder models.
true_cfg_scaleβ
true_cfg_scale: float = 1.0
update()β
update(latents)
Update the context with newly generated latents/image data.
vae_condition_imagesβ
vae_condition_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None
Optional VAE-conditioning images prepared by the tokenizer.
Qwen image edit keeps prompt-conditioning images and VAE-conditioning images separate because the multimodal prompt encoder and the VAE latent conditioning path use different resize targets.
widthβ
width: int = 1024
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!