Python class
PipelineRuntimeConfig
PipelineRuntimeConfig
class max.pipelines.lib.PipelineRuntimeConfig(*, config_file=None, section_name=None, pipeline_role='prefill_and_decode', max_batch_size=None, max_queue_size_tg=None, min_batch_size_tg=None, ep_size=1, ep_use_allreduce=False, ce_delay_ms=0.0, enable_prioritize_first_decode=False, enable_chunked_prefill=True, enable_in_flight_batching=False, max_num_steps=-1, max_batch_input_tokens=8192, use_experimental_kernels='false', use_vendor_blas='false', use_vendor_ccl='false', custom_architectures=<factory>, zmq_endpoint_base=<factory>, execute_empty_batches=False, max_batch_total_tokens=None, device_graph_capture=None, force=False, kvcache_ce_watermark=0.95, decode_stall_timeout_s=None, enable_overlap_scheduler=False, prefer_module_v3=False, reasoning_parser=None, defer_resolve=False, max_vision_cache_entries=256, denoising_cache=<factory>)
Bases: ConfigFileModel
Model-agnostic runtime settings for pipeline execution.
Contains batching, scheduling, and execution configuration that is independent of any particular model architecture.
-
Parameters:
-
- config_file (str | None)
- section_name (str | None)
- pipeline_role (Literal['prefill_and_decode', 'prefill_only', 'decode_only'])
- max_batch_size (int | None)
- max_queue_size_tg (int | None)
- min_batch_size_tg (int | None)
- ep_size (int)
- ep_use_allreduce (bool)
- ce_delay_ms (float)
- enable_prioritize_first_decode (bool)
- enable_chunked_prefill (bool)
- enable_in_flight_batching (bool)
- max_num_steps (int)
- max_batch_input_tokens (int)
- use_experimental_kernels (str)
- use_vendor_blas (str)
- use_vendor_ccl (str)
- custom_architectures (list[str])
- zmq_endpoint_base (str)
- execute_empty_batches (bool)
- max_batch_total_tokens (int | None)
- device_graph_capture (bool | None)
- force (bool)
- kvcache_ce_watermark (float)
- decode_stall_timeout_s (float | None)
- enable_overlap_scheduler (bool)
- prefer_module_v3 (bool)
- reasoning_parser (str | None)
- defer_resolve (bool)
- max_vision_cache_entries (int)
- denoising_cache (DenoisingCacheConfig)
ce_delay_ms
ce_delay_ms: float
custom_architectures
decode_stall_timeout_s
defer_resolve
defer_resolve: bool
denoising_cache
denoising_cache: DenoisingCacheConfig
device_graph_capture
enable_chunked_prefill
enable_chunked_prefill: bool
enable_in_flight_batching
enable_in_flight_batching: bool
enable_overlap_scheduler
enable_overlap_scheduler: bool
enable_prioritize_first_decode
enable_prioritize_first_decode: bool
ep_size
ep_size: int
ep_use_allreduce
ep_use_allreduce: bool
execute_empty_batches
execute_empty_batches: bool
force
force: bool
kvcache_ce_watermark
kvcache_ce_watermark: float
max_batch_input_tokens
max_batch_input_tokens: int
max_batch_size
max_batch_total_tokens
max_num_steps
max_num_steps: int
max_queue_size_tg
max_vision_cache_entries
max_vision_cache_entries: int
min_batch_size_tg
model_config
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
pipeline_role
pipeline_role: PipelineRole
prefer_module_v3
prefer_module_v3: bool
reasoning_parser
use_experimental_kernels
use_experimental_kernels: str
use_vendor_blas
use_vendor_blas: str
use_vendor_ccl
use_vendor_ccl: str
zmq_endpoint_base
zmq_endpoint_base: str
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!