Python module
max.pipelines.lib
Types to interface with ML pipelines such as text/token generation.
Configuration
MAXConfig | Abstract base class for all MAX configs. |
|---|---|
MAXModelConfigBase | Abstract base class for MAX model configuration. |
PipelineRuntimeConfig | Model-agnostic runtime settings for pipeline execution. |
Pipelines
EmbeddingsPipelineType | alias of Pipeline[EmbeddingsGenerationInputs, EmbeddingsGenerationOutput] |
|---|---|
OverlapTextGenerationPipeline | Overlap text generation pipeline. |
StandaloneSpeculativeDecodingPipeline | Standalone speculative decoding where draft model runs independently. |
Model interface
AlwaysSignalBuffersMixin | Mixin for models that always require signal buffers. |
|---|---|
PipelineModelWithKVCache | A pipeline model that supports KV cache. |
Tokenizers
PixelGenerationTokenizer | Encapsulates creation of PixelContext and specific token encode/decode logic. |
|---|
LoRA
LoRAManager | Manages multiple LoRA models and buffers for the forward pass. |
|---|---|
LoRARequestProcessor | Processes LoRA requests by delegating operations to a LoRAManager. |
Utilities
CompilationTimer | Timer for logging graph build and compilation phases. |
|---|---|
HuggingFaceRepo | Handle for interacting with a Hugging Face repository (remote or local). |
WeightPathParser | Parses and validates weight paths for model configuration. |
Functions
convert_max_config_value | Converts a config value to the appropriate type. |
|---|---|
deep_merge_max_configs | Deep merge two MAXConfig configuration dictionaries. |
float32_to_bfloat16_as_uint16 | Converts a float32 array to bfloat16 representation stored as uint16. |
generate_local_model_path | Generates the local filesystem path where a Hugging Face model repo is cached. |
get_default_max_config_file_section_name | Gets the default section name for a MAXConfig class. |
max_tokens_to_generate | Returns the max number of new tokens to generate. |
parse_quant_config | Parses scaled quantization config from HuggingFace config and state dict. |
rejection_sampler | Builds a graph that implements speculative decoding rejection sampling. |
rejection_sampler_with_residuals | Builds a rejection sampler with residual sampling for speculative decoding. |
resolve_max_config_inheritance | Resolves configuration inheritance by loading base config and merging. |
token_sampler | Builds a sampling graph that samples tokens from logits. |
try_to_load_from_cache | Wrapper around huggingface_hub.try_to_load_from_cache; validates repo exists. |
validate_hf_repo_access | Validates repository access and raises clear, user-friendly errors. |
Submodules
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!