Skip to main content

Python class

PixelContext

PixelContext

class max.pipelines.PixelContext(*, tokens, request_id=<factory>, model_name='', mask=None, tokens_2=None, negative_tokens=None, negative_mask=None, negative_tokens_2=None, explicit_negative_prompt=False, timesteps=<factory>, sigmas=<factory>, latents=<factory>, latent_image_ids=<factory>, text_ids=<factory>, negative_text_ids=<factory>, height=1024, width=1024, num_inference_steps=50, guidance_scale=3.5, true_cfg_scale=1.0, strength=0.6, cfg_normalization=False, cfg_truncation=1.0, num_warmup_steps=0, num_images_per_prompt=1, input_image=None, input_images=None, prompt_images=None, vae_condition_images=None, output_format='jpeg', residual_threshold=None, status=GenerationStatus.ACTIVE)

source

Bases: object

A model-ready context for image/video generation requests.

Per the design doc, this class contains only numeric data that the model will execute against. User-facing strings (prompt, negative_prompt) are consumed during tokenization and do not appear here.

All preprocessing is performed by PixelGenerationTokenizer.new_context():

  • Prompt tokenization -> tokens field
  • Negative prompt tokenization -> negative_tokens field
  • Timestep schedule computation -> timesteps field
  • Initial noise generation -> latents field

Parameters:

cfg_normalization

cfg_normalization: bool = False

source

cfg_truncation

cfg_truncation: float = 1.0

source

compute_num_available_steps()

compute_num_available_steps(max_seq_len)

source

Compute number of available steps for scheduler compatibility.

For image and video generation, this returns the number of inference steps.

Parameters:

max_seq_len (int)

Return type:

int

explicit_negative_prompt

explicit_negative_prompt: bool = False

source

Whether the request explicitly supplied a negative prompt.

guidance_scale

guidance_scale: float = 3.5

source

height

height: int = 1024

source

input_image

input_image: ndarray[tuple[Any, ...], dtype[uint8]] | None = None

source

Input image as numpy array (H, W, C) in uint8 format for image-to-image generation.

input_images

input_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None

source

Input images as list of numpy arrays (H, W, C) in uint8 format for image-to-image generation.

is_done

property is_done: bool

source

Whether the request has completed generation.

latent_image_ids

latent_image_ids: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed latent image IDs for generation.

latents

latents: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed initial noise (latents) for generation.

mask

mask: ndarray[tuple[Any, ...], dtype[bool]] | None = None

source

Mask for text encoder’s attention.

model_name

model_name: str = ''

source

negative_mask

negative_mask: ndarray[tuple[Any, ...], dtype[bool]] | None = None

source

Mask for the negative text encoder path.

negative_text_ids

negative_text_ids: ndarray[tuple[Any, ...], dtype[int64]]

source

Precomputed text position IDs for the negative prompt.

negative_tokens

negative_tokens: TokenBuffer | None = None

source

Negative tokens for primary encoder.

negative_tokens_2

negative_tokens_2: TokenBuffer | None = None

source

Negative tokens for secondary encoder. None for single-encoder models.

num_images_per_prompt

num_images_per_prompt: int = 1

source

num_inference_steps

num_inference_steps: int = 50

source

num_warmup_steps

num_warmup_steps: int = 0

source

output_format

output_format: str = 'jpeg'

source

Image encoding format for the output (e.g., ‘jpeg’, ‘png’, ‘webp’).

prompt_images

prompt_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None

source

Optional prompt-conditioning images prepared by the tokenizer.

request_id

request_id: RequestID

source

reset()

reset()

source

Resets the context’s state.

Return type:

None

residual_threshold

residual_threshold: float | None = None

source

Per-request residual threshold for FBCache. None uses pipeline default.

sigmas

sigmas: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed sigmas schedule for denoising.

status

status: GenerationStatus = 'active'

source

strength

strength: float = 0.6

source

text_ids

text_ids: ndarray[tuple[Any, ...], dtype[int64]]

source

Precomputed text position IDs, shape (B, seq_len, 4) int64.

timesteps

timesteps: ndarray[tuple[Any, ...], dtype[float32]]

source

Precomputed timesteps schedule for denoising.

to_generation_output()

to_generation_output()

source

Convert this context to a GenerationOutput object.

Return type:

GenerationOutput

tokens

tokens: TokenBuffer

source

Primary encoder tokens.

tokens_2

tokens_2: TokenBuffer | None = None

source

Secondary encoder tokens. None for single-encoder models.

true_cfg_scale

true_cfg_scale: float = 1.0

source

update()

update(latents)

source

Update the context with newly generated latents/image data.

Parameters:

latents (ndarray[tuple[Any, ...], dtype[Any]])

Return type:

None

vae_condition_images

vae_condition_images: list[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None

source

Optional VAE-conditioning images prepared by the tokenizer.

Qwen image edit keeps prompt-conditioning images and VAE-conditioning images separate because the multimodal prompt encoder and the VAE latent conditioning path use different resize targets.

width

width: int = 1024

source