For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
PipelineConfig
PipelineConfigβ
class max.pipelines.PipelineConfig(*, config_file=None, section_name=None, debug_verify_replay=False, models=<factory>, model_override=<factory>, sampling=<factory>, profiling=<factory>, lora=None, speculative=None, runtime=<factory>, task=PipelineTask.UNDEFINED)
Bases: ConfigFileModel
Configuration for a pipeline.
Contains settings for model selection, batch sizing, sampling, profiling, LoRA adapters, and speculative decoding. Once initialized, all fields are resolved to their final values from CLI flags, config files, environment variables, or internal defaults.
-
Parameters:
-
- config_file (str | None)
- section_name (str | None)
- debug_verify_replay (bool)
- models (dict[str, MAXModelConfig])
- model_override (list[str])
- sampling (SamplingConfig)
- profiling (ProfilingConfig)
- lora (LoRAConfig | None)
- speculative (SpeculativeConfig | None)
- runtime (PipelineRuntimeConfig)
- task (PipelineTask)
config_fileβ
Path to the configuration file.
configure_session()β
configure_session(session)
Configures a InferenceSession with standard pipeline settings.
-
Parameters:
-
session (InferenceSession)
-
Return type:
-
None
debug_verify_replayβ
debug_verify_replay: bool
Whether to run eager verification before device graph replay.
draft_modelβ
property draft_model: MAXModelConfig | None
The draft model configuration. Alias for models.get("draft").
estimate_signal_buffer_memory()β
estimate_signal_buffer_memory(arch_config=None)
Estimates total signal-buffer memory across all devices.
Signal buffers are fixed-size (NUM_BYTES)
per-GPU allocations used by P2P collectives. Each independent allocation
site contributes one set of ngpus buffers. The base estimate counts
the sites visible from PipelineConfig:
- main model graph (multi-GPU only),
BlockOffloadEnginefor KV-cache offloading, only when itsreplicate_kv_across_tppath is active (MLA model with DP=1 and multi-device TP). Seeblock_copy_engine.py/transfer_engine.py.
Returns 0 for single-device pipelines.
-
Parameters:
-
arch_config (ArchConfig | None) β Optional architecture config. When provided and it exposes KV params, the BCE term is gated on the actual
replicates_kv_across_tpflag rather than only thekv_connectorsetting. Without it, the BCE term is added whenever a connector is configured (conservative). -
Returns:
-
Estimated total signal-buffer memory in bytes (across all devices).
-
Return type:
graph_quantization_encodingβ
property graph_quantization_encoding: QuantizationEncoding | None
Converts the CLI encoding to a MAX graph quantization encoding.
-
Returns:
-
The graph quantization encoding corresponding to the CLI encoding.
log_basic_config()β
log_basic_config()
Log minimal pipeline configuration information.
Logs basic PipelineConfig options including model name, pipeline task,
weight path, max_batch_size, max_seq_len, and reserved memory.
-
Return type:
-
None
log_pipeline_info()β
log_pipeline_info()
Logs comprehensive pipeline and KVCache configuration information.
Retrieves all necessary information from self and the PIPELINE_REGISTRY. Raises an error if architecture is not found (which should not happen after config resolution).
-
Return type:
-
None
loraβ
lora: LoRAConfig | None
The LoRA configuration.
modelβ
property model: MAXModelConfig
The main model config. Alias for models["main"].
model_configβ
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_overrideβ
Per-component model overrides applied before resolution.
model_post_init()β
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since thatβs what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) β The BaseModel instance.
- context (Any) β The context.
-
Return type:
-
None
modelsβ
models: _ModelsType
The model manifest containing all model configs keyed by role.
needs_bitmask_constraintsβ
property needs_bitmask_constraints: bool
Whether constrained decoding can fire and requires the bitmask path.
True if the user enabled --enable-structured-output (for
user-supplied response_format=json_schema) or a tool parser is
configured (tool-call grammars work without the flag β they are
server-generated and gated on having a parser that can both produce
the grammar and parse the resulting output).
Drives whether model / sampler graphs are compiled with a bitmask
input and whether the D2H pinned buffer is allocated. Distinct from
sampling.enable_structured_output, which is the user-facing
flag and only gates honoring user-supplied JSON schemas.
profilingβ
profiling: ProfilingConfig
The profiling configuration.
resolve()β
resolve()
Validates and resolves the config.
Called after the config is initialized to ensure all config fields are in a valid state.
-
Return type:
-
None
runtimeβ
runtime: PipelineRuntimeConfig
The model-agnostic runtime settings for pipeline execution.
samplingβ
sampling: SamplingConfig
The sampling configuration.
section_nameβ
Optional section name for comprehensive/multi-section config files.
If not provided, values are loaded from the YAML top-level (treating the file as an βindividual configβ file).
speculativeβ
speculative: SpeculativeConfig | None
The speculative decoding configuration.
taskβ
task: PipelineTask
The pipeline task, used for arch disambiguation during config resolution.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!