Python module
config
Standardized configuration for Pipeline Inference.
AudioGenerationConfig
class max.pipelines.lib.config.AudioGenerationConfig(audio_decoder, audio_decoder_weights='', chunk_size=None, buffer=0, block_causal=False, prepend_prompt_speech_tokens=PrependPromptSpeechTokens.NEVER, prepend_prompt_speech_tokens_causal=False, run_model_test_mode=False, prometheus_metrics_mode=PrometheusMetricsMode.INSTRUMENT_ONLY, *, config_file=None, section_name=None, max_length=None, pipeline_role=PipelineRole.PrefillAndDecode, max_batch_size=None, max_queue_size_tg=None, min_batch_size_tg=None, ep_size=1, ce_delay_ms=0.0, enable_prioritize_first_decode=False, enable_chunked_prefill=True, enable_in_flight_batching=False, max_num_steps=-1, max_batch_input_tokens=8192, enable_echo=False, pool_embeddings=True, chat_template=None, use_experimental_kernels='false', use_vendor_blas='false', pdl_level='0', custom_architectures=<factory>, zmq_endpoint_base=<factory>, execute_empty_batches=False, max_batch_total_tokens=None, force=False, kvcache_ce_watermark=0.95, enable_overlap_scheduler=False, use_legacy_module=True, defer_resolve=False, model=<factory>, draft_model=None, sampling=<factory>, profiling=<factory>, lora=None, speculative=None, audio_decoder_config=<factory>)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
-
Parameters:
-
- audio_decoder (str)
- audio_decoder_weights (str)
- chunk_size (list[int] | None)
- buffer (int)
- block_causal (bool)
- prepend_prompt_speech_tokens (PrependPromptSpeechTokens)
- prepend_prompt_speech_tokens_causal (bool)
- run_model_test_mode (bool)
- prometheus_metrics_mode (PrometheusMetricsMode)
- config_file (str | None)
- section_name (str | None)
- max_length (int | None)
- pipeline_role (PipelineRole)
- max_batch_size (int | None)
- max_queue_size_tg (int | None)
- min_batch_size_tg (int | None)
- ep_size (int)
- ce_delay_ms (float)
- enable_prioritize_first_decode (bool)
- enable_chunked_prefill (bool)
- enable_in_flight_batching (bool)
- max_num_steps (int)
- max_batch_input_tokens (int)
- enable_echo (bool)
- pool_embeddings (bool)
- chat_template (Path | None)
- use_experimental_kernels (str)
- use_vendor_blas (str)
- pdl_level (str)
- custom_architectures (list[str])
- zmq_endpoint_base (str)
- execute_empty_batches (bool)
- max_batch_total_tokens (int | None)
- force (bool)
- kvcache_ce_watermark (float)
- enable_overlap_scheduler (bool)
- use_legacy_module (bool)
- defer_resolve (bool)
- model (MAXModelConfig)
- draft_model (MAXModelConfig | None)
- sampling (SamplingConfig)
- profiling (ProfilingConfig)
- lora (LoRAConfig | None)
- speculative (SpeculativeConfig | None)
- audio_decoder_config (dict[str, Any])
audio_decoder
audio_decoder: str
audio_decoder_config
audio_decoder_weights
audio_decoder_weights: str
block_causal
block_causal: bool
buffer
buffer: int
chunk_size
from_flags()
classmethod from_flags(audio_flags, **config_flags)
-
Parameters:
-
Return type:
model_config
model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
prepend_prompt_speech_tokens
prepend_prompt_speech_tokens: PrependPromptSpeechTokens
prepend_prompt_speech_tokens_causal
prepend_prompt_speech_tokens_causal: bool
prometheus_metrics_mode
prometheus_metrics_mode: PrometheusMetricsMode
PipelineConfig
class max.pipelines.lib.config.PipelineConfig(*, config_file=None, section_name=None, max_length=None, pipeline_role=PipelineRole.PrefillAndDecode, max_batch_size=None, max_queue_size_tg=None, min_batch_size_tg=None, ep_size=1, ce_delay_ms=0.0, enable_prioritize_first_decode=False, enable_chunked_prefill=True, enable_in_flight_batching=False, max_num_steps=-1, max_batch_input_tokens=8192, enable_echo=False, pool_embeddings=True, chat_template=None, use_experimental_kernels='false', use_vendor_blas='false', pdl_level='0', custom_architectures=<factory>, zmq_endpoint_base=<factory>, execute_empty_batches=False, max_batch_total_tokens=None, force=False, kvcache_ce_watermark=0.95, enable_overlap_scheduler=False, use_legacy_module=True, defer_resolve=False, model=<factory>, draft_model=None, sampling=<factory>, profiling=<factory>, lora=None, speculative=None)
Configuration for a pipeline.
WIP - Once a PipelineConfig is fully initialized, it should be as immutable as possible (frozen=True). All underlying dataclass fields should have been initialized to their default values, be it user specified via some CLI flag, config file, environment variable, or internally set to a reasonable default.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
-
Parameters:
-
- config_file (str | None)
- section_name (str | None)
- max_length (int | None)
- pipeline_role (PipelineRole)
- max_batch_size (int | None)
- max_queue_size_tg (int | None)
- min_batch_size_tg (int | None)
- ep_size (int)
- ce_delay_ms (float)
- enable_prioritize_first_decode (bool)
- enable_chunked_prefill (bool)
- enable_in_flight_batching (bool)
- max_num_steps (int)
- max_batch_input_tokens (int)
- enable_echo (bool)
- pool_embeddings (bool)
- chat_template (Path | None)
- use_experimental_kernels (str)
- use_vendor_blas (str)
- pdl_level (str)
- custom_architectures (list[str])
- zmq_endpoint_base (str)
- execute_empty_batches (bool)
- max_batch_total_tokens (int | None)
- force (bool)
- kvcache_ce_watermark (float)
- enable_overlap_scheduler (bool)
- use_legacy_module (bool)
- defer_resolve (bool)
- model (MAXModelConfig)
- draft_model (MAXModelConfig | None)
- sampling (SamplingConfig)
- profiling (ProfilingConfig)
- lora (LoRAConfig | None)
- speculative (SpeculativeConfig | None)
ce_delay_ms
ce_delay_ms: float
chat_template
chat_template: Path | None
configure_session()
configure_session(session)
Configure an InferenceSession with standard pipeline settings.
-
Parameters:
-
session (InferenceSession)
-
Return type:
-
None
custom_architectures
defer_resolve
defer_resolve: bool
draft_model
draft_model: MAXModelConfig | None
enable_chunked_prefill
enable_chunked_prefill: bool
enable_echo
enable_echo: bool
enable_in_flight_batching
enable_in_flight_batching: bool
enable_overlap_scheduler
enable_overlap_scheduler: bool
enable_prioritize_first_decode
enable_prioritize_first_decode: bool
ep_size
ep_size: int
execute_empty_batches
execute_empty_batches: bool
force
force: bool
graph_quantization_encoding
property graph_quantization_encoding: QuantizationEncoding | None
Converts the CLI encoding to a MAX graph quantization encoding.
-
Returns:
-
The graph quantization encoding corresponding to the CLI encoding.
kvcache_ce_watermark
kvcache_ce_watermark: float
log_basic_config()
log_basic_config()
Log minimal pipeline configuration information.
Logs basic PipelineConfig options including model name, pipeline task, weight path, max_batch_size, max_seq_len, and reserved memory.
-
Return type:
-
None
log_pipeline_info()
log_pipeline_info()
Log comprehensive pipeline and KVCache configuration information.
Retrieves all necessary information from self and the PIPELINE_REGISTRY. Raises an error if architecture is not found (which should not happen after config resolution).
-
Return type:
-
None
lora
lora: LoRAConfig | None
max_batch_input_tokens
max_batch_input_tokens: int
max_batch_size
max_batch_total_tokens
max_length
max_num_steps
max_num_steps: int
max_queue_size_tg
min_batch_size_tg
model
model: MAXModelConfig
model_config
model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
pdl_level
pdl_level: str
pipeline_role
pipeline_role: PipelineRole
pool_embeddings
pool_embeddings: bool
profiling
profiling: ProfilingConfig
resolve()
resolve()
Validates and resolves the config.
This method is called after the config is initialized, to ensure that all config fields have been initialized to a valid state.
-
Return type:
-
None
retrieve_chat_template()
retrieve_chat_template()
-
Return type:
-
str | None
sampling
sampling: SamplingConfig
speculative
speculative: SpeculativeConfig | None
use_experimental_kernels
use_experimental_kernels: str
use_legacy_module
use_legacy_module: bool
use_vendor_blas
use_vendor_blas: str
zmq_endpoint_base
zmq_endpoint_base: str
PrependPromptSpeechTokens
class max.pipelines.lib.config.PrependPromptSpeechTokens(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
NEVER
NEVER = 'never'
Never prepend the prompt speech tokens sent to the audio decoder.
ONCE
ONCE = 'once'
Prepend the prompt speech tokens to the first block of the audio decoder.
ROLLING
ROLLING = 'rolling'
Prepend the prompt speech tokens to the first block of the audio decoder, and to later blocks to reach the requested buffer size.
PrometheusMetricsMode
class max.pipelines.lib.config.PrometheusMetricsMode(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
INSTRUMENT_ONLY
INSTRUMENT_ONLY = 'instrument_only'
Instrument metrics through the Prometheus client library, relying on the application to handle the metrics server.
LAUNCH_MULTIPROC_SERVER
LAUNCH_MULTIPROC_SERVER = 'launch_multiproc_server'
Launch a Prometheus server in multiprocess mode to report metrics.
LAUNCH_SERVER
LAUNCH_SERVER = 'launch_server'
Launch a Prometheus server to handle metrics requests.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!