IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /get-started.md). For the complete documentation index, see llms.txt.

Skip to main content

For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /get-started.md).

Python module

max.pipelines.lib

Types to interface with ML pipelines such as text/token generation.

Submodules

Configuration

`DenoisingCacheConfig`	Pipeline-level cache configuration for diffusion model denoising.
`KVConnectorConfig`	Connector-specific configuration for KV cache connectors.
`MAXConfig`	Abstract base class for all MAX configs.
`MAXModelConfigBase`	Abstract base class for MAX model configuration.
`PipelineArgs`	Flat, user-settable input arguments for a pipeline.
`PipelineRuntimeConfig`	Model-agnostic runtime settings for pipeline execution.

Pipelines

`EmbeddingsPipelineType`	alias of `Pipeline`[`EmbeddingsGenerationInputs`, `EmbeddingsGenerationOutput`]
`OverlapTextGenerationPipeline`	Overlap text generation pipeline.

Model interface

`AlwaysSignalBuffersMixin`	Mixin for models that always require signal buffers.
`PipelineModelWithKVCache`	A pipeline model that supports KV cache.
`UnifiedEagleOutputs`	Outputs from a unified EAGLE graph execution.

Tokenizers

`PixelGenerationTokenizer`	Encapsulates creation of PixelContext and specific token encode/decode logic.

LoRA

`LoRAManager`	Manages multiple LoRA models and buffers for the forward pass.

Utilities

`CompilationTimer`	Timer for logging graph build, compile, and init phases.
`HuggingFaceRepo`	Handle for interacting with a Hugging Face repository (remote or local).
`ModelManifest`	Registry mapping semantic role strings to MAXModelConfig instances.
`WeightPathParser`	Parses and validates weight paths for model configuration.

Functions

`build_eos_tracker_for_request`	Builds an `EOSTracker` from request sampling params.
`convert_max_config_value`	Converts a config value to the appropriate type.
`deep_merge_max_configs`	Deep merge two MAXConfig configuration dictionaries.
`float32_array_to_buffer`	Create a device buffer from float32 host data without a cast-only graph.
`float32_to_bfloat16_as_uint16`	Converts a float32 array to bfloat16 representation stored as uint16.
`generate_local_model_path`	Generates the local filesystem path where a Hugging Face model repo is cached.
`get_default_max_config_file_section_name`	Gets the default section name for a MAXConfig class.
`max_tokens_to_generate`	Returns the max number of new tokens to generate.
`parse_quant_config`	Parses scaled quantization config from HuggingFace config and state dict.
`resolve_max_config_inheritance`	Resolves configuration inheritance by loading base config and merging.
`try_to_load_from_cache`	Wrapper around `huggingface_hub.try_to_load_from_cache`; validates repo exists.
`validate_hf_repo_access`	Validates repository access and raises clear, user-friendly errors.

Submodules
Configuration
Pipelines
Model interface
Tokenizers
LoRA
Utilities
Functions