Python class
VLMTextGenerationContext
VLMTextGenerationContext
class max.interfaces.VLMTextGenerationContext(*args, **kwargs)
Bases: TextGenerationContext, Protocol
Protocol defining the interface for VLM input contexts.
compute_image_aligned_idx()
compute_image_aligned_idx(idx)
Aligns an index downward to avoid splitting an image token span.
If idx falls within the token range occupied by an image, this
method returns the start_idx of that image so that the split point
does not cut through image tokens. If idx does not land inside any
image span, it is returned unchanged.
image_idx
property image_idx: int
Index of the next unencoded image in the prompt.
image_token_indices
property image_token_indices: ndarray[tuple[Any, ...], dtype[int32]]
Positions of image-placeholder tokens within this context’s token buffer.
Offsets are relative to the start of the full token sequence (not the
active window). Used by compute_multimodal_merge_indices to build
batch-level scatter indices that account for processed_length.
images
property images: list[ImageMetadata]
The images in the context.
needs_vision_encoding
property needs_vision_encoding: bool
Whether vision encoding is needed for this context.
next_images
property next_images: list[ImageMetadata]
The images that are not yet encoded.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!