Python module
max.pipelines.lib
Types to interface with ML pipelines such as text/token generation.
Configurationβ
DenoisingCacheConfig | Pipeline-level cache configuration for diffusion model denoising. |
|---|---|
KVConnectorConfig | Connector-specific configuration for KV cache connectors. |
MAXConfig | Abstract base class for all MAX configs. |
MAXModelConfigBase | Abstract base class for MAX model configuration. |
PipelineRuntimeConfig | Model-agnostic runtime settings for pipeline execution. |
Pipelinesβ
EmbeddingsPipelineType | alias of Pipeline[EmbeddingsGenerationInputs, EmbeddingsGenerationOutput] |
|---|---|
OverlapTextGenerationPipeline | Overlap text generation pipeline. |
StandaloneSpeculativeDecodingPipeline | Standalone speculative decoding where draft model runs independently. |
Model interfaceβ
AlwaysSignalBuffersMixin | Mixin for models that always require signal buffers. |
|---|---|
PipelineModelWithKVCache | A pipeline model that supports KV cache. |
UnifiedEagleOutputs | Outputs from a unified EAGLE graph execution. |
Tokenizersβ
PixelGenerationTokenizer | Encapsulates creation of PixelContext and specific token encode/decode logic. |
|---|
LoRAβ
LoRAManager | Manages multiple LoRA models and buffers for the forward pass. |
|---|---|
LoRARequestProcessor | Processes LoRA requests by delegating operations to a LoRAManager. |
Utilitiesβ
CompilationTimer | Timer for logging graph build and compilation phases. |
|---|---|
HuggingFaceRepo | Handle for interacting with a Hugging Face repository (remote or local). |
ModelManifest | Registry mapping semantic role strings to MAXModelConfig instances. |
WeightPathParser | Parses and validates weight paths for model configuration. |
Functionsβ
build_eos_tracker_for_request | Builds an EOSTracker from request sampling params. |
|---|---|
convert_max_config_value | Converts a config value to the appropriate type. |
deep_merge_max_configs | Deep merge two MAXConfig configuration dictionaries. |
float32_array_to_buffer | Create a device buffer from float32 host data without a cast-only graph. |
float32_to_bfloat16_as_uint16 | Converts a float32 array to bfloat16 representation stored as uint16. |
generate_local_model_path | Generates the local filesystem path where a Hugging Face model repo is cached. |
get_default_max_config_file_section_name | Gets the default section name for a MAXConfig class. |
max_tokens_to_generate | Returns the max number of new tokens to generate. |
parse_quant_config | Parses scaled quantization config from HuggingFace config and state dict. |
rejection_sampler | Builds a graph that implements speculative decoding rejection sampling. |
rejection_sampler_with_residuals | Builds a rejection sampler with residual sampling for speculative decoding. |
resolve_max_config_inheritance | Resolves configuration inheritance by loading base config and merging. |
token_sampler | Builds a sampling graph that samples tokens from logits. |
try_to_load_from_cache | Wrapper around huggingface_hub.try_to_load_from_cache; validates repo exists. |
validate_hf_repo_access | Validates repository access and raises clear, user-friendly errors. |
Submodulesβ
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!