Skip to main content

Python module

max.pipelines.lib

Types to interface with ML pipelines such as text/token generation.

Configuration

MAXConfigAbstract base class for all MAX configs.
MAXModelConfigBaseAbstract base class for MAX model configuration.
PipelineRuntimeConfigModel-agnostic runtime settings for pipeline execution.

Pipelines

EmbeddingsPipelineTypealias of Pipeline[EmbeddingsGenerationInputs, EmbeddingsGenerationOutput]
OverlapTextGenerationPipelineOverlap text generation pipeline.
StandaloneSpeculativeDecodingPipelineStandalone speculative decoding where draft model runs independently.

Model interface

AlwaysSignalBuffersMixinMixin for models that always require signal buffers.
PipelineModelWithKVCacheA pipeline model that supports KV cache.

Tokenizers

PixelGenerationTokenizerEncapsulates creation of PixelContext and specific token encode/decode logic.

LoRA

LoRAManagerManages multiple LoRA models and buffers for the forward pass.
LoRARequestProcessorProcesses LoRA requests by delegating operations to a LoRAManager.

Utilities

CompilationTimerTimer for logging graph build and compilation phases.
HuggingFaceRepoHandle for interacting with a Hugging Face repository (remote or local).
WeightPathParserParses and validates weight paths for model configuration.

Functions

convert_max_config_valueConverts a config value to the appropriate type.
deep_merge_max_configsDeep merge two MAXConfig configuration dictionaries.
float32_to_bfloat16_as_uint16Converts a float32 array to bfloat16 representation stored as uint16.
generate_local_model_pathGenerates the local filesystem path where a Hugging Face model repo is cached.
get_default_max_config_file_section_nameGets the default section name for a MAXConfig class.
max_tokens_to_generateReturns the max number of new tokens to generate.
parse_quant_configParses scaled quantization config from HuggingFace config and state dict.
rejection_samplerBuilds a graph that implements speculative decoding rejection sampling.
rejection_sampler_with_residualsBuilds a rejection sampler with residual sampling for speculative decoding.
resolve_max_config_inheritanceResolves configuration inheritance by loading base config and merging.
token_samplerBuilds a sampling graph that samples tokens from logits.
try_to_load_from_cacheWrapper around huggingface_hub.try_to_load_from_cache; validates repo exists.
validate_hf_repo_accessValidates repository access and raises clear, user-friendly errors.

Submodules