Skip to main content

Python class

PipelineRuntimeConfig

PipelineRuntimeConfig

class max.pipelines.lib.PipelineRuntimeConfig(*, config_file=None, section_name=None, pipeline_role='prefill_and_decode', max_batch_size=None, max_queue_size_tg=None, min_batch_size_tg=None, ep_size=1, ep_use_allreduce=False, ce_delay_ms=0.0, enable_prioritize_first_decode=False, enable_chunked_prefill=True, enable_in_flight_batching=False, max_num_steps=-1, max_batch_input_tokens=8192, use_experimental_kernels='false', use_vendor_blas='false', use_vendor_ccl='false', custom_architectures=<factory>, zmq_endpoint_base=<factory>, execute_empty_batches=False, max_batch_total_tokens=None, device_graph_capture=None, force=False, kvcache_ce_watermark=0.95, decode_stall_timeout_s=None, enable_overlap_scheduler=False, prefer_module_v3=False, reasoning_parser=None, defer_resolve=False, max_vision_cache_entries=256, denoising_cache=<factory>)

source

Bases: ConfigFileModel

Model-agnostic runtime settings for pipeline execution.

Contains batching, scheduling, and execution configuration that is independent of any particular model architecture.

Parameters:

  • config_file (str | None)
  • section_name (str | None)
  • pipeline_role (Literal['prefill_and_decode', 'prefill_only', 'decode_only'])
  • max_batch_size (int | None)
  • max_queue_size_tg (int | None)
  • min_batch_size_tg (int | None)
  • ep_size (int)
  • ep_use_allreduce (bool)
  • ce_delay_ms (float)
  • enable_prioritize_first_decode (bool)
  • enable_chunked_prefill (bool)
  • enable_in_flight_batching (bool)
  • max_num_steps (int)
  • max_batch_input_tokens (int)
  • use_experimental_kernels (str)
  • use_vendor_blas (str)
  • use_vendor_ccl (str)
  • custom_architectures (list[str])
  • zmq_endpoint_base (str)
  • execute_empty_batches (bool)
  • max_batch_total_tokens (int | None)
  • device_graph_capture (bool | None)
  • force (bool)
  • kvcache_ce_watermark (float)
  • decode_stall_timeout_s (float | None)
  • enable_overlap_scheduler (bool)
  • prefer_module_v3 (bool)
  • reasoning_parser (str | None)
  • defer_resolve (bool)
  • max_vision_cache_entries (int)
  • denoising_cache (DenoisingCacheConfig)

ce_delay_ms

ce_delay_ms: float

source

custom_architectures

custom_architectures: list[str]

source

decode_stall_timeout_s

decode_stall_timeout_s: float | None

source

defer_resolve

defer_resolve: bool

source

denoising_cache

denoising_cache: DenoisingCacheConfig

source

device_graph_capture

device_graph_capture: bool | None

source

enable_chunked_prefill

enable_chunked_prefill: bool

source

enable_in_flight_batching

enable_in_flight_batching: bool

source

enable_overlap_scheduler

enable_overlap_scheduler: bool

source

enable_prioritize_first_decode

enable_prioritize_first_decode: bool

source

ep_size

ep_size: int

source

ep_use_allreduce

ep_use_allreduce: bool

source

execute_empty_batches

execute_empty_batches: bool

source

force

force: bool

source

kvcache_ce_watermark

kvcache_ce_watermark: float

source

max_batch_input_tokens

max_batch_input_tokens: int

source

max_batch_size

max_batch_size: int | None

source

max_batch_total_tokens

max_batch_total_tokens: int | None

source

max_num_steps

max_num_steps: int

source

max_queue_size_tg

max_queue_size_tg: int | None

source

max_vision_cache_entries

max_vision_cache_entries: int

source

min_batch_size_tg

min_batch_size_tg: int | None

source

model_config

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init()

model_post_init(context, /)

source

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

  • self (BaseModel) – The BaseModel instance.
  • context (Any) – The context.

Return type:

None

pipeline_role

pipeline_role: PipelineRole

source

prefer_module_v3

prefer_module_v3: bool

source

reasoning_parser

reasoning_parser: str | None

source

use_experimental_kernels

use_experimental_kernels: str

source

use_vendor_blas

use_vendor_blas: str

source

use_vendor_ccl

use_vendor_ccl: str

source

zmq_endpoint_base

zmq_endpoint_base: str

source