Python class
ImageMetadata
ImageMetadata
class max.interfaces.ImageMetadata(*, start_idx, end_idx, pixel_values, image_hash=None)
Bases: object
Metadata about an image in the prompt.
Each image corresponds to a range in the text token array [start_idx, end_idx).
-
Parameters:
end_idx
end_idx: int
One after the index of the last <vision_token_id> special token for the image
image_hash
Hash of the image, for use in prefix caching
pixel_values
Pixel values for the image.
Can be various dtypes depending on the vision model:
- float32: Original precision
- uint16: BFloat16 bits stored as uint16 (workaround for NumPy’s lack of native bfloat16 support). Reinterpreted as bfloat16 on GPU.
start_idx
start_idx: int
Index of the first <vision_token_id> special token for the image
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!