Skip to main content

Python module

max.pipelines.modeling.types

Universal interfaces between all aspects of the MAX Inference Stack.

Pipeline base​

InputModalityEnum representing the types of input a model architecture accepts.
PipelineDefines the interface for pipeline operations.
PipelineInputsBase class representing inputs to a pipeline operation.
PipelineInputsTypeType variable.
PipelineOutputProtocol representing the output of a pipeline operation.
PipelineOutputTypeType variable.
PipelineOutputsDictdict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(

<br/>**<br/>

kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2).
PipelinesFactoryType alias for factory functions that create pipeline instances.
PipelineTaskEnum representing the types of pipeline tasks supported.
PipelineTokenizerInterface for LLM tokenizers.
TokenizerEncodedType variable.
UnboundContextTypeType variable.

Text generation​

BatchTypeType of batch.
MessageContentRepresent a PEP 604 union type
SpecDecodingStatePer-request state for speculative decoding.
TextContentPartA plain-text content part of a message.
TextGenerationContextProtocol defining the interface for text generation contexts in token generation.
TextGenerationContextTypeType variable.
TextGenerationInputsInput parameters for text generation pipeline operations.
TextGenerationOutputRepresents the output of a text generation operation.
TextGenerationRequestAn immutable request for text token generation from a pipeline.
TextGenerationRequestFunctionRepresents a function definition for a text generation request.
TextGenerationRequestMessageA single message in a text generation request conversation.
TextGenerationRequestToolRepresents a tool definition for a text generation request.
TextGenerationResponseFormatRepresents the response format specification for a text generation request.
VLMTextGenerationContextProtocol defining the interface for VLM input contexts.

Embeddings​

EmbeddingsContextProtocol defining the interface for embeddings generation contexts.
EmbeddingsGenerationContextTypeType variable.
EmbeddingsGenerationInputsBatched inputs for an embeddings generation pipeline step.
EmbeddingsGenerationOutputResponse structure for embedding generation.

Audio generation​

AudioGenerationContextTypeType variable.
AudioGenerationInputsInput data structure for audio generation pipelines.
AudioGenerationMetadataRepresents metadata associated with audio generation.
AudioGenerationOutputRepresents a response from the audio generation API.
AudioGenerationRequestAn immutable request for audio generation from a pipeline.

Image generation​

ImageContentPartAn image content part of a message.
ImageMetadataMetadata about an image in the prompt.
PixelGenerationContextProtocol defining the interface for pixel generation contexts.
PixelGenerationContextTypeType variable.
PixelGenerationInputsInput data structure for pixel generation pipelines.
VideoContentPartA video content part of a message.

Reasoning​

ParsedReasoningDeltaResult of applying reasoning parsing to a streaming delta chunk.
ReasoningParserParser for identifying reasoning spans in model output.
ReasoningSpanIdentifies a reasoning span within a token ID sequence.

Tool parsing​

ParsedToolCallA parsed tool/function call extracted from model output.
ParsedToolCallDeltaIncremental tool call data for streaming responses.
ParsedToolResponseResult of parsing a complete model response for tool calls.
ToolParserProtocol for parsing tool calls from model responses.

Context and sampling​

BaseContextCore interface for request lifecycle management across all of MAX, including serving, scheduling, and pipelines.
BaseContextTypeType variable.
EOSTrackerCentralized EOS tracking: single-ID, sequence-ID, and stop-sequence checks.
GenerationOutputOutput container for image generation pipeline operations.
GenerationStatusEnum representing the status of a generation process in the MAX API.
SamplingParamsRequest specific sampling parameters that are only known at run time.
SamplingParamsGenerationConfigDefaultsDefault sampling parameter values extracted from a model's GenerationConfig.
SamplingParamsInputInput dataclass for creating SamplingParams instances.

Requests​

OpenResponsesRequestGeneral request container for OpenResponses API requests.
RequestProtocol representing a generic request within the MAX API.
RequestIDA unique immutable identifier for a request.
RequestTypeType variable.
DUMMY_REQUEST_IDA unique immutable identifier for a request.

Tokens​

LogProbabilitiesLog probabilities for an individual output token.
RangeRepresents a range with start and end indices.
TokenBufferA dynamically resizable container for managing token sequences.
TokenSlicendarray(shape, dtype=float, buffer=None, offset=0,

Logit processors​

BatchLogitsProcessoralias of Callable[[BatchProcessorInputs], None]
BatchProcessorInputsArguments for a batch logits processor.
LogitsProcessoralias of Callable[[ProcessorInputs], None]
ProcessorInputsInputs passed to a logits processor callback.

LoRA​

LoRAOperationEnum for different LoRA operations.
LoRARequestContainer for LoRA adapter requests.
LoRAResponseResponse from LoRA operations.
LoRAStatusEnum for LoRA operation status.
LoRATypeEnumeration for LoRA Types.
LORA_REQUEST_ENDPOINTstr(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
LORA_RESPONSE_ENDPOINTstr(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Utilities​

SharedMemoryArrayA wrapper for a NumPy array stored in shared memory.
msgpack_numpy_decoderCreate a decoder function for the specified type.
msgpack_numpy_encoderCreate an encoder function that handles numpy arrays.