Skip to main content

Python module

max.pipelines

Types to interface with ML pipelines such as text/token/pixel generation.

Configuration

AudioGenerationConfigConfiguration for an audio generation pipeline.
KVCacheConfigConfiguration for the paged KV cache.
LoRAConfigConfiguration for LoRA (Low-Rank Adaptation) inference.
MAXModelConfigConfiguration for a pipeline model.
PipelineConfigConfiguration for a pipeline.
ProfilingConfigConfiguration for GPU profiling of pipeline models.
SamplingConfigConfiguration for the sampling stage of token generation.
SpeculativeConfigConfiguration for speculative decoding.

Pipelines

EmbeddingsPipelineGeneralized token generator pipeline.
PixelGenerationPipelinePixel generation pipeline for diffusion models.
SpeechTokenGenerationPipelineA text-to-speech token generation pipeline for TTS models.
TextGenerationPipelineGeneralized token generator pipeline.
TextGenerationPipelineInterfaceInterface for text generation pipelines.

Model interface

GenerateMixinProtocol for pipelines that support text generation.
MemoryEstimatorEstimates available memory for pipeline model allocation.
ModelInputsBase class for model inputs.
ModelOutputsPipeline model outputs.
PipelineModelA pipeline model with setup, input preparation and execution methods.

Context

PixelContextA model-ready context for image/video generation requests.
TextAndVisionContextA base class for model context, specifically for Vision model variants.
TextContextA base class for model context, specifically for Text model variants.
TTSContextA context for Text-to-Speech (TTS) model inference.

Tokenizers

IdentityPipelineTokenizerA pass-through tokenizer that returns prompts unchanged.
PreTrainedPipelineTokenizerA pipeline tokenizer backed by a Hugging Face pre-trained tokenizer.
TextAndVisionTokenizerEncapsulates creation of TextAndVisionContext and specific token encode/decode logic.
TextTokenizerEncapsulates creation of TextContext and specific token encode/decode logic.

Enums

PipelineRolealias of Literal['prefill_and_decode', 'prefill_only', 'decode_only']
PrometheusMetricsModealias of Literal['instrument_only', 'launch_server', 'launch_multiproc_server']
RepoTypealias of Literal['online', 'local']
RopeTypealias of Literal['none', 'normal', 'neox', 'longrope', 'yarn']
SupportedEncodingalias of Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq']

Utilities

PrependPromptSpeechTokensalias of Literal['never', 'once', 'rolling']
download_weight_filesDownloads weight files for a Hugging Face model and returns local paths.
is_float4_encodingReturns whether the given encoding is a float4 type.
parse_supported_encoding_from_file_nameInfers a SupportedEncoding from a file name string.
supported_encoding_dtypeReturns the underlying model dtype for the given encoding.
supported_encoding_quantizationReturns the QuantizationEncoding for the given encoding.
supported_encoding_supported_devicesReturns the devices that the given encoding is supported on.
supported_encoding_supported_onReturns whether the given encoding is supported on a device.
upper_bounded_defaultReturns a value not exceeding the upper bound.

Submodules