Skip to main content

Python module

config

Standardized configuration for Pipeline Inference.

AudioGenerationConfig

class max.pipelines.lib.config.AudioGenerationConfig(audio_decoder, audio_decoder_weights='', chunk_size=None, buffer=0, block_causal=False, prepend_prompt_speech_tokens=PrependPromptSpeechTokens.NEVER, prepend_prompt_speech_tokens_causal=False, run_model_test_mode=False, prometheus_metrics_mode=PrometheusMetricsMode.INSTRUMENT_ONLY, *, config_file=None, section_name=None, max_length=None, pipeline_role=PipelineRole.PrefillAndDecode, max_batch_size=None, max_queue_size_tg=None, min_batch_size_tg=None, ep_size=1, ce_delay_ms=0.0, enable_prioritize_first_decode=False, enable_chunked_prefill=True, enable_in_flight_batching=False, max_num_steps=-1, max_batch_input_tokens=8192, enable_echo=False, pool_embeddings=True, chat_template=None, use_experimental_kernels='false', use_vendor_blas='false', pdl_level='0', custom_architectures=<factory>, zmq_endpoint_base=<factory>, execute_empty_batches=False, max_batch_total_tokens=None, force=False, kvcache_ce_watermark=0.95, enable_overlap_scheduler=False, use_legacy_module=True, defer_resolve=False, model=<factory>, draft_model=None, sampling=<factory>, profiling=<factory>, lora=None, speculative=None, audio_decoder_config=<factory>)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

  • audio_decoder (str)
  • audio_decoder_weights (str)
  • chunk_size (list[int] | None)
  • buffer (int)
  • block_causal (bool)
  • prepend_prompt_speech_tokens (PrependPromptSpeechTokens)
  • prepend_prompt_speech_tokens_causal (bool)
  • run_model_test_mode (bool)
  • prometheus_metrics_mode (PrometheusMetricsMode)
  • config_file (str | None)
  • section_name (str | None)
  • max_length (int | None)
  • pipeline_role (PipelineRole)
  • max_batch_size (int | None)
  • max_queue_size_tg (int | None)
  • min_batch_size_tg (int | None)
  • ep_size (int)
  • ce_delay_ms (float)
  • enable_prioritize_first_decode (bool)
  • enable_chunked_prefill (bool)
  • enable_in_flight_batching (bool)
  • max_num_steps (int)
  • max_batch_input_tokens (int)
  • enable_echo (bool)
  • pool_embeddings (bool)
  • chat_template (Path | None)
  • use_experimental_kernels (str)
  • use_vendor_blas (str)
  • pdl_level (str)
  • custom_architectures (list[str])
  • zmq_endpoint_base (str)
  • execute_empty_batches (bool)
  • max_batch_total_tokens (int | None)
  • force (bool)
  • kvcache_ce_watermark (float)
  • enable_overlap_scheduler (bool)
  • use_legacy_module (bool)
  • defer_resolve (bool)
  • model (MAXModelConfig)
  • draft_model (MAXModelConfig | None)
  • sampling (SamplingConfig)
  • profiling (ProfilingConfig)
  • lora (LoRAConfig | None)
  • speculative (SpeculativeConfig | None)
  • audio_decoder_config (dict[str, Any])

audio_decoder

audio_decoder: str

audio_decoder_config

audio_decoder_config: dict[str, Any]

audio_decoder_weights

audio_decoder_weights: str

block_causal

block_causal: bool

buffer

buffer: int

chunk_size

chunk_size: list[int] | None

from_flags()

classmethod from_flags(audio_flags, **config_flags)

Parameters:

Return type:

AudioGenerationConfig

model_config

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init()

model_post_init(context, /)

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

  • self (BaseModel) – The BaseModel instance.
  • context (Any) – The context.

Return type:

None

prepend_prompt_speech_tokens

prepend_prompt_speech_tokens: PrependPromptSpeechTokens

prepend_prompt_speech_tokens_causal

prepend_prompt_speech_tokens_causal: bool

prometheus_metrics_mode

prometheus_metrics_mode: PrometheusMetricsMode

PipelineConfig

class max.pipelines.lib.config.PipelineConfig(*, config_file=None, section_name=None, max_length=None, pipeline_role=PipelineRole.PrefillAndDecode, max_batch_size=None, max_queue_size_tg=None, min_batch_size_tg=None, ep_size=1, ce_delay_ms=0.0, enable_prioritize_first_decode=False, enable_chunked_prefill=True, enable_in_flight_batching=False, max_num_steps=-1, max_batch_input_tokens=8192, enable_echo=False, pool_embeddings=True, chat_template=None, use_experimental_kernels='false', use_vendor_blas='false', pdl_level='0', custom_architectures=<factory>, zmq_endpoint_base=<factory>, execute_empty_batches=False, max_batch_total_tokens=None, force=False, kvcache_ce_watermark=0.95, enable_overlap_scheduler=False, use_legacy_module=True, defer_resolve=False, model=<factory>, draft_model=None, sampling=<factory>, profiling=<factory>, lora=None, speculative=None)

Configuration for a pipeline.

WIP - Once a PipelineConfig is fully initialized, it should be as immutable as possible (frozen=True). All underlying dataclass fields should have been initialized to their default values, be it user specified via some CLI flag, config file, environment variable, or internally set to a reasonable default.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

  • config_file (str | None)
  • section_name (str | None)
  • max_length (int | None)
  • pipeline_role (PipelineRole)
  • max_batch_size (int | None)
  • max_queue_size_tg (int | None)
  • min_batch_size_tg (int | None)
  • ep_size (int)
  • ce_delay_ms (float)
  • enable_prioritize_first_decode (bool)
  • enable_chunked_prefill (bool)
  • enable_in_flight_batching (bool)
  • max_num_steps (int)
  • max_batch_input_tokens (int)
  • enable_echo (bool)
  • pool_embeddings (bool)
  • chat_template (Path | None)
  • use_experimental_kernels (str)
  • use_vendor_blas (str)
  • pdl_level (str)
  • custom_architectures (list[str])
  • zmq_endpoint_base (str)
  • execute_empty_batches (bool)
  • max_batch_total_tokens (int | None)
  • force (bool)
  • kvcache_ce_watermark (float)
  • enable_overlap_scheduler (bool)
  • use_legacy_module (bool)
  • defer_resolve (bool)
  • model (MAXModelConfig)
  • draft_model (MAXModelConfig | None)
  • sampling (SamplingConfig)
  • profiling (ProfilingConfig)
  • lora (LoRAConfig | None)
  • speculative (SpeculativeConfig | None)

ce_delay_ms

ce_delay_ms: float

chat_template

chat_template: Path | None

configure_session()

configure_session(session)

Configure an InferenceSession with standard pipeline settings.

Parameters:

session (InferenceSession)

Return type:

None

custom_architectures

custom_architectures: list[str]

defer_resolve

defer_resolve: bool

draft_model

draft_model: MAXModelConfig | None

enable_chunked_prefill

enable_chunked_prefill: bool

enable_echo

enable_echo: bool

enable_in_flight_batching

enable_in_flight_batching: bool

enable_overlap_scheduler

enable_overlap_scheduler: bool

enable_prioritize_first_decode

enable_prioritize_first_decode: bool

ep_size

ep_size: int

execute_empty_batches

execute_empty_batches: bool

force

force: bool

graph_quantization_encoding

property graph_quantization_encoding: QuantizationEncoding | None

Converts the CLI encoding to a MAX graph quantization encoding.

Returns:

The graph quantization encoding corresponding to the CLI encoding.

kvcache_ce_watermark

kvcache_ce_watermark: float

log_basic_config()

log_basic_config()

Log minimal pipeline configuration information.

Logs basic PipelineConfig options including model name, pipeline task, weight path, max_batch_size, max_seq_len, and reserved memory.

Return type:

None

log_pipeline_info()

log_pipeline_info()

Log comprehensive pipeline and KVCache configuration information.

Retrieves all necessary information from self and the PIPELINE_REGISTRY. Raises an error if architecture is not found (which should not happen after config resolution).

Return type:

None

lora

lora: LoRAConfig | None

max_batch_input_tokens

max_batch_input_tokens: int

max_batch_size

max_batch_size: int | None

max_batch_total_tokens

max_batch_total_tokens: int | None

max_length

max_length: int | None

max_num_steps

max_num_steps: int

max_queue_size_tg

max_queue_size_tg: int | None

min_batch_size_tg

min_batch_size_tg: int | None

model

model: MAXModelConfig

model_config

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init()

model_post_init(context, /)

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

  • self (BaseModel) – The BaseModel instance.
  • context (Any) – The context.

Return type:

None

pdl_level

pdl_level: str

pipeline_role

pipeline_role: PipelineRole

pool_embeddings

pool_embeddings: bool

profiling

profiling: ProfilingConfig

resolve()

resolve()

Validates and resolves the config.

This method is called after the config is initialized, to ensure that all config fields have been initialized to a valid state.

Return type:

None

retrieve_chat_template()

retrieve_chat_template()

Return type:

str | None

sampling

sampling: SamplingConfig

speculative

speculative: SpeculativeConfig | None

use_experimental_kernels

use_experimental_kernels: str

use_legacy_module

use_legacy_module: bool

use_vendor_blas

use_vendor_blas: str

zmq_endpoint_base

zmq_endpoint_base: str

PrependPromptSpeechTokens

class max.pipelines.lib.config.PrependPromptSpeechTokens(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

NEVER

NEVER = 'never'

Never prepend the prompt speech tokens sent to the audio decoder.

ONCE

ONCE = 'once'

Prepend the prompt speech tokens to the first block of the audio decoder.

ROLLING

ROLLING = 'rolling'

Prepend the prompt speech tokens to the first block of the audio decoder, and to later blocks to reach the requested buffer size.

PrometheusMetricsMode

class max.pipelines.lib.config.PrometheusMetricsMode(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

INSTRUMENT_ONLY

INSTRUMENT_ONLY = 'instrument_only'

Instrument metrics through the Prometheus client library, relying on the application to handle the metrics server.

LAUNCH_MULTIPROC_SERVER

LAUNCH_MULTIPROC_SERVER = 'launch_multiproc_server'

Launch a Prometheus server in multiprocess mode to report metrics.

LAUNCH_SERVER

LAUNCH_SERVER = 'launch_server'

Launch a Prometheus server to handle metrics requests.

Was this page helpful?