Python class

ModelOutputs

`ModelOutputs`

class max.pipelines.ModelOutputs(logits, next_token_logits=None, logit_offsets=None, hidden_states=None)

source

Bases: object

Pipeline model outputs.

Shape conventions below are for text-generation pipelines:

B: batch size
V: vocabulary size
H: hidden-state width
T: number of returned logit rows (depends on return mode)

The shape depends on the value of the ReturnLogits and ReturnHiddenStates enums. Unless we are running with spec decoding, we use ReturnLogits.LAST_TOKEN and ReturnHiddenStates.NONE.

Parameters:

logits (Buffer)
next_token_logits (Buffer | None)
logit_offsets (Buffer | None)
hidden_states (Buffer | None)

`hidden_states`

hidden_states: Buffer | None = None

source

Optional hidden states for text generation.

Single-device shape is [T_h, H] where:

none mode: NONE (default)
last-token mode: T_h = B
all-token mode: T_h = total_input_tokens

For data parallel models, the hs will be on the first gpu since it is replicated.

`logit_offsets`

logit_offsets: Buffer | None = None

source

Cumulative row offsets into logits for text generation.

Shape is [B + 1]. Per-sequence logits are: logits[logit_offsets[i]:logit_offsets[i + 1], :].

`logits`

logits: Buffer

source

Primary logits buffer.

For text generation this has shape [T, V] where:

last-token mode: T = B (default)
all-token mode: T = total_input_tokens
variable mode: T = logit_offsets[-1] (typically B * return_n_logits)

`next_token_logits`

next_token_logits: Buffer | None = None

source

Next-token logits for text generation, shape [B, V] when present.

ModelOutputs​

hidden_states​

logit_offsets​

logits​

next_token_logits​

`ModelOutputs`

`hidden_states`

`logit_offsets`

`logits`

`next_token_logits`