IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python module

max.pipelines.lib

Types to interface with ML pipelines such as text/token generation.

Configuration​

DenoisingCacheConfigPipeline-level cache configuration for diffusion model denoising.
KVConnectorConfigConnector-specific configuration for KV cache connectors.
MAXConfigAbstract base class for all MAX configs.
MAXModelConfigBaseAbstract base class for MAX model configuration.
PipelineRuntimeConfigModel-agnostic runtime settings for pipeline execution.

Pipelines​

EmbeddingsPipelineTypealias of Pipeline[EmbeddingsGenerationInputs, EmbeddingsGenerationOutput]
OverlapTextGenerationPipelineOverlap text generation pipeline.
StandaloneSpeculativeDecodingPipelineStandalone speculative decoding where draft model runs independently.

Model interface​

AlwaysSignalBuffersMixinMixin for models that always require signal buffers.
PipelineModelWithKVCacheA pipeline model that supports KV cache.
UnifiedEagleOutputsOutputs from a unified EAGLE graph execution.

Tokenizers​

PixelGenerationTokenizerEncapsulates creation of PixelContext and specific token encode/decode logic.

LoRA​

LoRAManagerManages multiple LoRA models and buffers for the forward pass.
LoRARequestProcessorProcesses LoRA requests by delegating operations to a LoRAManager.

Utilities​

CompilationTimerTimer for logging graph build, compile, and init phases.
HuggingFaceRepoHandle for interacting with a Hugging Face repository (remote or local).
ModelManifestRegistry mapping semantic role strings to MAXModelConfig instances.
WeightPathParserParses and validates weight paths for model configuration.

Functions​

build_eos_tracker_for_requestBuilds an EOSTracker from request sampling params.
convert_max_config_valueConverts a config value to the appropriate type.
deep_merge_max_configsDeep merge two MAXConfig configuration dictionaries.
float32_array_to_bufferCreate a device buffer from float32 host data without a cast-only graph.
float32_to_bfloat16_as_uint16Converts a float32 array to bfloat16 representation stored as uint16.
generate_local_model_pathGenerates the local filesystem path where a Hugging Face model repo is cached.
get_default_max_config_file_section_nameGets the default section name for a MAXConfig class.
max_tokens_to_generateReturns the max number of new tokens to generate.
parse_quant_configParses scaled quantization config from HuggingFace config and state dict.
rejection_samplerBuilds a graph that implements speculative decoding rejection sampling.
rejection_sampler_with_residualsBuilds a rejection sampler with residual sampling for speculative decoding.
resolve_max_config_inheritanceResolves configuration inheritance by loading base config and merging.
token_samplerBuilds a sampling graph that samples tokens from logits.
try_to_load_from_cacheWrapper around huggingface_hub.try_to_load_from_cache; validates repo exists.
validate_hf_repo_accessValidates repository access and raises clear, user-friendly errors.

Submodules​