Skip to main content

Python module

max.interfaces

Universal interfaces between all aspects of the MAX Inference Stack.

Pipeline base

PipelineAbstract base class for pipeline operations.
PipelineInputsBase class representing inputs to a pipeline operation.
PipelineOutputProtocol representing the output of a pipeline operation.
PipelinesFactoryType alias for factory functions that create pipeline instances.
PipelineTaskEnum representing the types of pipeline tasks supported.
PipelineTokenizerInterface for LLM tokenizers.

Text generation

BatchTypeType of batch.
MessageContentRepresent a PEP 604 union type
TextContentPartA plain-text content part of a message.
TextGenerationContextProtocol defining the interface for text generation contexts in token generation.
TextGenerationInputsInput parameters for text generation pipeline operations.
TextGenerationOutputRepresents the output of a text generation operation.
TextGenerationRequestAn immutable request for text token generation from a pipeline.
TextGenerationRequestFunctionRepresents a function definition for a text generation request.
TextGenerationRequestMessageA single message in a text generation request conversation.
TextGenerationRequestToolRepresents a tool definition for a text generation request.
TextGenerationResponseFormatRepresents the response format specification for a text generation request.
VLMTextGenerationContextProtocol defining the interface for VLM input contexts.

Embeddings

EmbeddingsContextProtocol defining the interface for embeddings generation contexts.
EmbeddingsGenerationInputsBatched inputs for an embeddings generation pipeline step.
EmbeddingsGenerationOutputResponse structure for embedding generation.

Audio generation

AudioGenerationInputsInput data structure for audio generation pipelines.
AudioGenerationMetadataRepresents metadata associated with audio generation.
AudioGenerationOutputRepresents a response from the audio generation API.
AudioGenerationRequestAn immutable request for audio generation from a pipeline.

Image generation

ImageContentPartAn image content part of a message.
ImageMetadataMetadata about an image in the prompt.
PixelGenerationContextProtocol defining the interface for pixel generation contexts.
PixelGenerationInputsInput data structure for pixel generation pipelines.

Context and sampling

BaseContextCore interface for request lifecycle management across all of MAX, including serving, scheduling, and pipelines.
GenerationOutputOutput container for image generation pipeline operations.
GenerationStatusEnum representing the status of a generation process in the MAX API.
SamplingParamsRequest specific sampling parameters that are only known at run time.
SamplingParamsGenerationConfigDefaultsDefault sampling parameter values extracted from a model's GenerationConfig.
SamplingParamsInputInput dataclass for creating SamplingParams instances.

Requests and scheduling

OpenResponsesRequestGeneral request container for OpenResponses API requests.
RequestProtocol representing a generic request within the MAX API.
RequestIDA unique immutable identifier for a request.
SchedulerAbstract base class defining the interface for schedulers.
SchedulerResultStructure representing the result of a scheduler operation for a specific pipeline execution.

Tokens

LogProbabilitiesLog probabilities for an individual output token.
TokenBufferA dynamically resizable container for managing token sequences.
TokenSlicendarray(shape, dtype=float, buffer=None, offset=0,

Logit processors

BatchLogitsProcessoralias of Callable[[BatchProcessorInputs], None]
BatchProcessorInputsArguments for a batch logits processor.
LogitsProcessoralias of Callable[[ProcessorInputs], None]
ProcessorInputsInputs passed to a logits processor callback.

LoRA

LoRAOperationEnum for different LoRA operations.
LoRARequestContainer for LoRA adapter requests.
LoRAResponseResponse from LoRA operations.
LoRAStatusEnum for LoRA operation status.
LoRATypeEnumeration for LoRA Types.

Queues

MAXPullQueueProtocol for a minimal, non-blocking pull queue interface in MAX.
MAXPushQueueProtocol for a minimal, non-blocking push queue interface in MAX.
drain_queueRemove and return items from the queue without blocking.
get_blockingGet the next item from the queue.

Utilities

SharedMemoryArrayA wrapper for a NumPy array stored in shared memory.
msgpack_numpy_decoderCreate a decoder function for the specified type.
msgpack_numpy_encoderCreate an encoder function that handles numpy arrays.