Python class
PipelineRuntimeConfig
PipelineRuntimeConfigβ
class max.pipelines.lib.PipelineRuntimeConfig(*, config_file=None, section_name=None, pipeline_role='prefill_and_decode', max_batch_size=None, max_queue_size_tg=None, min_batch_size_tg=None, ep_size=1, ep_use_allreduce=False, ce_delay_ms=0.0, enable_prioritize_first_decode=False, enable_chunked_prefill=True, enable_in_flight_batching=False, max_num_steps=-1, max_batch_input_tokens=8192, use_experimental_kernels='false', use_vendor_blas='false', use_vendor_ccl='false', custom_architectures=<factory>, zmq_endpoint_base=<factory>, execute_empty_batches=False, max_batch_total_tokens=None, device_graph_capture=None, force=False, kvcache_ce_watermark=0.95, decode_stall_timeout_s=None, enable_overlap_scheduler=False, prefer_module_v3=False, reasoning_parser=None, defer_resolve=False, max_vision_cache_entries=256, denoising_cache=<factory>)
Bases: ConfigFileModel
Model-agnostic runtime settings for pipeline execution.
Contains batching, scheduling, and execution configuration that is independent of any particular model architecture.
-
Parameters:
-
- config_file (str | None)
- section_name (str | None)
- pipeline_role (Literal['prefill_and_decode', 'prefill_only', 'decode_only'])
- max_batch_size (int | None)
- max_queue_size_tg (int | None)
- min_batch_size_tg (int | None)
- ep_size (int)
- ep_use_allreduce (bool)
- ce_delay_ms (float)
- enable_prioritize_first_decode (bool)
- enable_chunked_prefill (bool)
- enable_in_flight_batching (bool)
- max_num_steps (int)
- max_batch_input_tokens (int)
- use_experimental_kernels (str)
- use_vendor_blas (str)
- use_vendor_ccl (str)
- custom_architectures (list[str])
- zmq_endpoint_base (str)
- execute_empty_batches (bool)
- max_batch_total_tokens (int | None)
- device_graph_capture (bool | None)
- force (bool)
- kvcache_ce_watermark (float)
- decode_stall_timeout_s (float | None)
- enable_overlap_scheduler (bool)
- prefer_module_v3 (bool)
- reasoning_parser (str | None)
- defer_resolve (bool)
- max_vision_cache_entries (int)
- denoising_cache (DenoisingCacheConfig)
ce_delay_msβ
ce_delay_ms: float
custom_architecturesβ
decode_stall_timeout_sβ
defer_resolveβ
defer_resolve: bool
denoising_cacheβ
denoising_cache: DenoisingCacheConfig
device_graph_captureβ
enable_chunked_prefillβ
enable_chunked_prefill: bool
enable_in_flight_batchingβ
enable_in_flight_batching: bool
enable_overlap_schedulerβ
enable_overlap_scheduler: bool
enable_prioritize_first_decodeβ
enable_prioritize_first_decode: bool
ep_sizeβ
ep_size: int
ep_use_allreduceβ
ep_use_allreduce: bool
execute_empty_batchesβ
execute_empty_batches: bool
forceβ
force: bool
kvcache_ce_watermarkβ
kvcache_ce_watermark: float
max_batch_input_tokensβ
max_batch_input_tokens: int
max_batch_sizeβ
max_batch_total_tokensβ
max_num_stepsβ
max_num_steps: int
max_queue_size_tgβ
max_vision_cache_entriesβ
max_vision_cache_entries: int
min_batch_size_tgβ
model_configβ
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_post_init()β
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since thatβs what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) β The BaseModel instance.
- context (Any) β The context.
-
Return type:
-
None
pipeline_roleβ
pipeline_role: PipelineRole
prefer_module_v3β
prefer_module_v3: bool
reasoning_parserβ
use_experimental_kernelsβ
use_experimental_kernels: str
use_vendor_blasβ
use_vendor_blas: str
use_vendor_cclβ
use_vendor_ccl: str
zmq_endpoint_baseβ
zmq_endpoint_base: str
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!