Skip to main content

Python class

ModelOutputs

ModelOutputs

class max.pipelines.ModelOutputs(logits, next_token_logits=None, logit_offsets=None, hidden_states=None)

source

Bases: object

Pipeline model outputs.

Shape conventions below are for text-generation pipelines:

  • B: batch size
  • V: vocabulary size
  • H: hidden-state width
  • T: number of returned logit rows (depends on return mode)

The shape depends on the value of the ReturnLogits and ReturnHiddenStates enums. Unless we are running with spec decoding, we use ReturnLogits.LAST_TOKEN and ReturnHiddenStates.NONE.

Parameters:

hidden_states

hidden_states: Buffer | None = None

source

Optional hidden states for text generation.

Single-device shape is [T_h, H] where:

  • none mode: NONE (default)
  • last-token mode: T_h = B
  • all-token mode: T_h = total_input_tokens

For data parallel models, the hs will be on the first gpu since it is replicated.

logit_offsets

logit_offsets: Buffer | None = None

source

Cumulative row offsets into logits for text generation.

Shape is [B + 1]. Per-sequence logits are: logits[logit_offsets[i]:logit_offsets[i + 1], :].

logits

logits: Buffer

source

Primary logits buffer.

For text generation this has shape [T, V] where:

  • last-token mode: T = B (default)
  • all-token mode: T = total_input_tokens
  • variable mode: T = logit_offsets[-1] (typically B * return_n_logits)

next_token_logits

next_token_logits: Buffer | None = None

source

Next-token logits for text generation, shape [B, V] when present.