IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

PipelineConfig

PipelineConfig​

class max.pipelines.PipelineConfig(*, config_file=None, section_name=None, debug_verify_replay=False, models=<factory>, model_override=<factory>, sampling=<factory>, profiling=<factory>, lora=None, speculative=None, runtime=<factory>, task=PipelineTask.UNDEFINED)

source

Bases: ConfigFileModel

Configuration for a pipeline.

Contains settings for model selection, batch sizing, sampling, profiling, LoRA adapters, and speculative decoding. Once initialized, all fields are resolved to their final values from CLI flags, config files, environment variables, or internal defaults.

Parameters:

config_file​

config_file: str | None

source

Path to the configuration file.

configure_session()​

configure_session(session)

source

Configures a InferenceSession with standard pipeline settings.

Parameters:

session (InferenceSession)

Return type:

None

debug_verify_replay​

debug_verify_replay: bool

source

Whether to run eager verification before device graph replay.

draft_model​

property draft_model: MAXModelConfig | None

source

The draft model configuration. Alias for models.get("draft").

estimate_signal_buffer_memory()​

estimate_signal_buffer_memory(arch_config=None)

source

Estimates total signal-buffer memory across all devices.

Signal buffers are fixed-size (NUM_BYTES) per-GPU allocations used by P2P collectives. Each independent allocation site contributes one set of ngpus buffers. The base estimate counts the sites visible from PipelineConfig:

  • main model graph (multi-GPU only),
  • BlockOffloadEngine for KV-cache offloading, only when its replicate_kv_across_tp path is active (MLA model with DP=1 and multi-device TP). See block_copy_engine.py / transfer_engine.py.

Returns 0 for single-device pipelines.

Parameters:

arch_config (ArchConfig | None) – Optional architecture config. When provided and it exposes KV params, the BCE term is gated on the actual replicates_kv_across_tp flag rather than only the kv_connector setting. Without it, the BCE term is added whenever a connector is configured (conservative).

Returns:

Estimated total signal-buffer memory in bytes (across all devices).

Return type:

int

graph_quantization_encoding​

property graph_quantization_encoding: QuantizationEncoding | None

source

Converts the CLI encoding to a MAX graph quantization encoding.

Returns:

The graph quantization encoding corresponding to the CLI encoding.

log_basic_config()​

log_basic_config()

source

Log minimal pipeline configuration information.

Logs basic PipelineConfig options including model name, pipeline task, weight path, max_batch_size, max_seq_len, and reserved memory.

Return type:

None

log_pipeline_info()​

log_pipeline_info()

source

Logs comprehensive pipeline and KVCache configuration information.

Retrieves all necessary information from self and the PIPELINE_REGISTRY. Raises an error if architecture is not found (which should not happen after config resolution).

Return type:

None

lora​

lora: LoRAConfig | None

source

The LoRA configuration.

model​

property model: MAXModelConfig

source

The main model config. Alias for models["main"].

model_config​

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_override​

model_override: list[str]

source

Per-component model overrides applied before resolution.

model_post_init()​

model_post_init(context, /)

source

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

  • self (BaseModel) – The BaseModel instance.
  • context (Any) – The context.

Return type:

None

models​

models: _ModelsType

source

The model manifest containing all model configs keyed by role.

needs_bitmask_constraints​

property needs_bitmask_constraints: bool

source

Whether constrained decoding can fire and requires the bitmask path.

True if the user enabled --enable-structured-output (for user-supplied response_format=json_schema) or a tool parser is configured (tool-call grammars work without the flag β€” they are server-generated and gated on having a parser that can both produce the grammar and parse the resulting output).

Drives whether model / sampler graphs are compiled with a bitmask input and whether the D2H pinned buffer is allocated. Distinct from sampling.enable_structured_output, which is the user-facing flag and only gates honoring user-supplied JSON schemas.

profiling​

profiling: ProfilingConfig

source

The profiling configuration.

resolve()​

resolve()

source

Validates and resolves the config.

Called after the config is initialized to ensure all config fields are in a valid state.

Return type:

None

runtime​

runtime: PipelineRuntimeConfig

source

The model-agnostic runtime settings for pipeline execution.

sampling​

sampling: SamplingConfig

source

The sampling configuration.

section_name​

section_name: str | None

source

Optional section name for comprehensive/multi-section config files.

If not provided, values are loaded from the YAML top-level (treating the file as an β€œindividual config” file).

speculative​

speculative: SpeculativeConfig | None

source

The speculative decoding configuration.

task​

task: PipelineTask

source

The pipeline task, used for arch disambiguation during config resolution.