Skip to main content

Python module

max.pipelines.lib

Types to interface with ML pipelines such as text/token generation.

Configuration​

DenoisingCacheConfigPipeline-level cache configuration for diffusion model denoising.
KVConnectorConfigConnector-specific configuration for KV cache connectors.
MAXConfigAbstract base class for all MAX configs.
MAXModelConfigBaseAbstract base class for MAX model configuration.
PipelineRuntimeConfigModel-agnostic runtime settings for pipeline execution.

Pipelines​

EmbeddingsPipelineTypealias of Pipeline[EmbeddingsGenerationInputs, EmbeddingsGenerationOutput]
OverlapTextGenerationPipelineOverlap text generation pipeline.
StandaloneSpeculativeDecodingPipelineStandalone speculative decoding where draft model runs independently.

Model interface​

AlwaysSignalBuffersMixinMixin for models that always require signal buffers.
PipelineModelWithKVCacheA pipeline model that supports KV cache.
UnifiedEagleOutputsOutputs from a unified EAGLE graph execution.

Tokenizers​

PixelGenerationTokenizerEncapsulates creation of PixelContext and specific token encode/decode logic.

LoRA​

LoRAManagerManages multiple LoRA models and buffers for the forward pass.
LoRARequestProcessorProcesses LoRA requests by delegating operations to a LoRAManager.

Utilities​

CompilationTimerTimer for logging graph build and compilation phases.
HuggingFaceRepoHandle for interacting with a Hugging Face repository (remote or local).
ModelManifestRegistry mapping semantic role strings to MAXModelConfig instances.
WeightPathParserParses and validates weight paths for model configuration.

Functions​

build_eos_tracker_for_requestBuilds an EOSTracker from request sampling params.
convert_max_config_valueConverts a config value to the appropriate type.
deep_merge_max_configsDeep merge two MAXConfig configuration dictionaries.
float32_array_to_bufferCreate a device buffer from float32 host data without a cast-only graph.
float32_to_bfloat16_as_uint16Converts a float32 array to bfloat16 representation stored as uint16.
generate_local_model_pathGenerates the local filesystem path where a Hugging Face model repo is cached.
get_default_max_config_file_section_nameGets the default section name for a MAXConfig class.
max_tokens_to_generateReturns the max number of new tokens to generate.
parse_quant_configParses scaled quantization config from HuggingFace config and state dict.
rejection_samplerBuilds a graph that implements speculative decoding rejection sampling.
rejection_sampler_with_residualsBuilds a rejection sampler with residual sampling for speculative decoding.
resolve_max_config_inheritanceResolves configuration inheritance by loading base config and merging.
token_samplerBuilds a sampling graph that samples tokens from logits.
try_to_load_from_cacheWrapper around huggingface_hub.try_to_load_from_cache; validates repo exists.
validate_hf_repo_accessValidates repository access and raises clear, user-friendly errors.

Submodules​