For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

ModelInputs

`ModelInputs`

class max.pipelines.ModelInputs(*, kv_cache_inputs=None, lora=None, vision_embeddings=<factory>, vision_scatter_indices=<factory>, hidden_states=None)

source

Bases: object

Base class for model inputs.

Use this class to encapsulate inputs for your model; you may store any number of dataclass fields.

The following example demonstrates how to create a custom inputs class:

@dataclass
class ReplitInputs(ModelInputs):
    tokens: Buffer
    input_row_offsets: Buffer

tokens = Buffer.zeros((1, 2, 3), DType.int64)
input_row_offsets = Buffer.zeros((1, 1, 1), DType.int64)

# Initialize inputs
inputs = ReplitInputs(tokens=tokens, input_row_offsets=input_row_offsets)

# Access tensors
list(inputs) == [tokens, input_row_offsets]  # Output: True

Parameters:

kv_cache_inputs (KVCacheInputsInterface[Buffer, Buffer] | None)
lora (LoRAInputs | None)
vision_embeddings (list[Buffer])
vision_scatter_indices (list[Buffer])
hidden_states (Buffer | list[Buffer] | None)

`buffers`

property buffers: tuple[Buffer, ...]

source

Returns positional Buffer inputs for model ABI calls.

`hidden_states`

hidden_states: Buffer | list[Buffer] | None = None

source

Hidden states for a variable number of tokens per sequence.

For data parallel models, this can be a list of Buffers where each Buffer contains hidden states for the sequences assigned to that device.

`kv_cache_inputs`

kv_cache_inputs: KVCacheInputsInterface[Buffer, Buffer] | None = None

source

KV cache graph inputs holding every (DP replica x TP shard) device’s inputs: a KVCacheInputs leaf, or a MultiKVCacheInputs tree for multi-cache models. flatten() yields the full positional input list.

`lora`

lora: LoRAInputs | None = None

source

Per-batch LoRA adapter buffers, or None when LoRA is disabled.

`update()`

update(**kwargs)

source

Updates attributes from keyword arguments (only existing, non-None).

Return type:: None

`vision_embeddings`

vision_embeddings: list[Buffer]

source

Per-device vision-merge embedding inputs for the language graph, set by the pipeline’s vision seam (finalize_vision_inputs) on every prepared batch of a vision-capable model: the assembled embeddings when this step encoded images, the model’s cached zero-row empties otherwise. Stays empty for text-only architectures.

`vision_scatter_indices`

vision_scatter_indices: list[Buffer]

source

Per-device merge (scatter) indices for vision_embeddings, with the same lifecycle.

ModelInputs​

buffers​

hidden_states​

kv_cache_inputs​

lora​

update()​

vision_embeddings​

vision_scatter_indices​