Skip to main content

Python class

ImageMetadata

ImageMetadata

class max.interfaces.ImageMetadata(*, start_idx, end_idx, pixel_values, image_hash=None)

source

Bases: object

Metadata about an image in the prompt.

Each image corresponds to a range in the text token array [start_idx, end_idx).

Parameters:

end_idx

end_idx: int

source

One after the index of the last <vision_token_id> special token for the image

image_hash

image_hash: int | None = None

source

Hash of the image, for use in prefix caching

pixel_values

pixel_values: ndarray[tuple[Any, ...], dtype[Any]]

source

Pixel values for the image.

Can be various dtypes depending on the vision model:

  • float32: Original precision
  • uint16: BFloat16 bits stored as uint16 (workaround for NumPy’s lack of native bfloat16 support). Reinterpreted as bfloat16 on GPU.

start_idx

start_idx: int

source

Index of the first <vision_token_id> special token for the image