IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

PipelineRuntimeConfig

PipelineRuntimeConfig​

class max.pipelines.lib.PipelineRuntimeConfig(*, config_file=None, section_name=None, pipeline_role='prefill_and_decode', max_batch_size=None, max_queue_size_tg=None, min_batch_size_tg=None, ep_size=1, ep_use_allreduce=False, ce_delay_ms=0.0, enable_prioritize_first_decode=False, enable_chunked_prefill=True, enable_in_flight_batching=False, max_num_steps=1, max_batch_input_tokens=8192, use_experimental_kernels='false', use_vendor_blas='false', use_vendor_ccl='false', custom_architectures=<factory>, execute_empty_batches=False, max_batch_total_tokens=None, device_graph_capture=None, force=False, kvcache_ce_watermark=0.95, decode_stall_timeout_s=None, decode_request_ttl_s=None, enable_overlap_scheduler=False, allow_unsupported_logprobs=False, allow_extra_request_fields=False, prefer_module_v3=False, reasoning_parser=None, tool_parser=None, temperature=None, thinking_temperature=None, defer_resolve=False, max_vision_cache_entries=256, denoising_cache=<factory>)

source

Bases: ConfigFileModel

Model-agnostic runtime settings for pipeline execution.

Contains batching, scheduling, and execution configuration that is independent of any particular model architecture.

Parameters:

  • config_file (str | None)
  • section_name (str | None)
  • pipeline_role (Literal['prefill_and_decode', 'prefill_only', 'decode_only'])
  • max_batch_size (int | None)
  • max_queue_size_tg (int | None)
  • min_batch_size_tg (int | None)
  • ep_size (int)
  • ep_use_allreduce (bool)
  • ce_delay_ms (float)
  • enable_prioritize_first_decode (bool)
  • enable_chunked_prefill (bool)
  • enable_in_flight_batching (bool)
  • max_num_steps (int)
  • max_batch_input_tokens (int)
  • use_experimental_kernels (str)
  • use_vendor_blas (str)
  • use_vendor_ccl (str)
  • custom_architectures (list[str])
  • execute_empty_batches (bool)
  • max_batch_total_tokens (int | None)
  • device_graph_capture (bool | None)
  • force (bool)
  • kvcache_ce_watermark (float)
  • decode_stall_timeout_s (float | None)
  • decode_request_ttl_s (float | None)
  • enable_overlap_scheduler (bool)
  • allow_unsupported_logprobs (bool)
  • allow_extra_request_fields (bool)
  • prefer_module_v3 (bool)
  • reasoning_parser (str | None)
  • tool_parser (str | None)
  • temperature (float | None)
  • thinking_temperature (float | None)
  • defer_resolve (bool)
  • max_vision_cache_entries (int)
  • denoising_cache (DenoisingCacheConfig)

allow_extra_request_fields​

allow_extra_request_fields: bool

source

allow_unsupported_logprobs​

allow_unsupported_logprobs: bool

source

ce_delay_ms​

ce_delay_ms: float

source

custom_architectures​

custom_architectures: list[str]

source

decode_request_ttl_s​

decode_request_ttl_s: float | None

source

decode_stall_timeout_s​

decode_stall_timeout_s: float | None

source

defer_resolve​

defer_resolve: bool

source

denoising_cache​

denoising_cache: DenoisingCacheConfig

source

device_graph_capture​

device_graph_capture: bool | None

source

enable_chunked_prefill​

enable_chunked_prefill: bool

source

enable_in_flight_batching​

enable_in_flight_batching: bool

source

enable_overlap_scheduler​

enable_overlap_scheduler: bool

source

enable_prioritize_first_decode​

enable_prioritize_first_decode: bool

source

ep_size​

ep_size: int

source

ep_use_allreduce​

ep_use_allreduce: bool

source

execute_empty_batches​

execute_empty_batches: bool

source

force​

force: bool

source

kvcache_ce_watermark​

kvcache_ce_watermark: float

source

max_batch_input_tokens​

max_batch_input_tokens: int

source

max_batch_size​

max_batch_size: int | None

source

max_batch_total_tokens​

max_batch_total_tokens: int | None

source

max_num_steps​

max_num_steps: int

source

max_queue_size_tg​

max_queue_size_tg: int | None

source

max_vision_cache_entries​

max_vision_cache_entries: int

source

min_batch_size_tg​

min_batch_size_tg: int | None

source

model_config​

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init()​

model_post_init(context, /)

source

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

  • self (BaseModel) – The BaseModel instance.
  • context (Any) – The context.

Return type:

None

pipeline_role​

pipeline_role: PipelineRole

source

prefer_module_v3​

prefer_module_v3: bool

source

reasoning_parser​

reasoning_parser: str | None

source

temperature​

temperature: float | None

source

thinking_temperature​

thinking_temperature: float | None

source

tool_parser​

tool_parser: str | None

source

use_experimental_kernels​

use_experimental_kernels: str

source

use_vendor_blas​

use_vendor_blas: str

source

use_vendor_ccl​

use_vendor_ccl: str

source