Skip to main content

Python module

max.interfaces

Universal interfaces between all aspects of the MAX Inference Stack.

Pipeline base​

PipelineDefines the interface for pipeline operations.
PipelineInputsBase class representing inputs to a pipeline operation.
PipelineOutputProtocol representing the output of a pipeline operation.
PipelinesFactoryType alias for factory functions that create pipeline instances.
PipelineTaskEnum representing the types of pipeline tasks supported.
PipelineTokenizerInterface for LLM tokenizers.

Text generation​

BatchTypeType of batch.
MessageContentRepresent a PEP 604 union type
TextContentPartA plain-text content part of a message.
TextGenerationContextProtocol defining the interface for text generation contexts in token generation.
TextGenerationInputsInput parameters for text generation pipeline operations.
TextGenerationOutputRepresents the output of a text generation operation.
TextGenerationRequestAn immutable request for text token generation from a pipeline.
TextGenerationRequestFunctionRepresents a function definition for a text generation request.
TextGenerationRequestMessageA single message in a text generation request conversation.
TextGenerationRequestToolRepresents a tool definition for a text generation request.
TextGenerationResponseFormatRepresents the response format specification for a text generation request.
VLMTextGenerationContextProtocol defining the interface for VLM input contexts.

Embeddings​

EmbeddingsContextProtocol defining the interface for embeddings generation contexts.
EmbeddingsGenerationInputsBatched inputs for an embeddings generation pipeline step.
EmbeddingsGenerationOutputResponse structure for embedding generation.

Audio generation​

AudioGenerationInputsInput data structure for audio generation pipelines.
AudioGenerationMetadataRepresents metadata associated with audio generation.
AudioGenerationOutputRepresents a response from the audio generation API.
AudioGenerationRequestAn immutable request for audio generation from a pipeline.

Image generation​

ImageContentPartAn image content part of a message.
ImageMetadataMetadata about an image in the prompt.
PixelGenerationContextProtocol defining the interface for pixel generation contexts.
PixelGenerationInputsInput data structure for pixel generation pipelines.
VideoContentPartA video content part of a message.

Reasoning​

ReasoningParserParser for identifying reasoning spans in model output.
ReasoningSpanIdentifies a reasoning span within a token ID sequence.

Tool parsing​

ParsedToolCallA parsed tool/function call extracted from model output.
ParsedToolCallDeltaIncremental tool call data for streaming responses.
ParsedToolResponseResult of parsing a complete model response for tool calls.
ToolParserProtocol for parsing tool calls from model responses.

Context and sampling​

BaseContextCore interface for request lifecycle management across all of MAX, including serving, scheduling, and pipelines.
GenerationOutputOutput container for image generation pipeline operations.
GenerationStatusEnum representing the status of a generation process in the MAX API.
SamplingParamsRequest specific sampling parameters that are only known at run time.
SamplingParamsGenerationConfigDefaultsDefault sampling parameter values extracted from a model's GenerationConfig.
SamplingParamsInputInput dataclass for creating SamplingParams instances.

Requests and scheduling​

OpenResponsesRequestGeneral request container for OpenResponses API requests.
RequestProtocol representing a generic request within the MAX API.
RequestIDA unique immutable identifier for a request.
SchedulerAbstract base class defining the interface for schedulers.
SchedulerResultStructure representing the result of a scheduler operation for a specific pipeline execution.

Tokens​

LogProbabilitiesLog probabilities for an individual output token.
TokenBufferA dynamically resizable container for managing token sequences.
TokenSlicendarray(shape, dtype=float, buffer=None, offset=0,

Logit processors​

BatchLogitsProcessoralias of Callable[[BatchProcessorInputs], None]
BatchProcessorInputsArguments for a batch logits processor.
LogitsProcessoralias of Callable[[ProcessorInputs], None]
ProcessorInputsInputs passed to a logits processor callback.

LoRA​

LoRAOperationEnum for different LoRA operations.
LoRARequestContainer for LoRA adapter requests.
LoRAResponseResponse from LoRA operations.
LoRAStatusEnum for LoRA operation status.
LoRATypeEnumeration for LoRA Types.

Queues​

MAXPullQueueProtocol for a minimal, non-blocking pull queue interface in MAX.
MAXPushQueueProtocol for a minimal, non-blocking push queue interface in MAX.
drain_queueRemove and return items from the queue without blocking.
get_blockingGet the next item from the queue.

Utilities​

SharedMemoryArrayA wrapper for a NumPy array stored in shared memory.
msgpack_numpy_decoderCreate a decoder function for the specified type.
msgpack_numpy_encoderCreate an encoder function that handles numpy arrays.