Skip to main content
Log in

Python module

config

Standardized configuration for Pipeline Inference.

AudioGenerationConfig

class max.pipelines.lib.config.AudioGenerationConfig(audio_config: 'dict[str, str]', **kwargs: 'Any')

Parameters:

  • audio_config (dict [ str , str ] )
  • kwargs (Any )

audio_decoder

audio_decoder*: str* = ''

The name of the audio decoder model architecture.

audio_decoder_weights

audio_decoder_weights*: str* = ''

The path to the audio decoder weights file.

audio_prompt_speakers

audio_prompt_speakers*: str* = ''

The path to the audio prompt speakers file.

block_causal

block_causal*: bool* = False

Whether prior buffered tokens should attend to tokens in the current block. Has no effect if buffer is not set.

block_sizes

block_sizes*: list[int] | None* = None

The block sizes to use for streaming. If this is an int, then fixed-size blocks of the given size are used If this is a list, then variable block sizes are used.

buffer

buffer*: int | None* = None

The number of previous speech tokens to pass to the audio decoder on each generation step.

prepend_prompt_speech_tokens

prepend_prompt_speech_tokens*: PrependPromptSpeechTokens* = 'never'

Whether the prompt speech tokens should be forwarded to the audio decoder. If “never”, the prompt tokens are not forwarded. If “once”, the prompt tokens are only forwarded on the first block. If “always”, the prompt tokens are forwarded on all blocks.

prepend_prompt_speech_tokens_causal

prepend_prompt_speech_tokens_causal*: bool* = False

Whether the prompt speech tokens should attend to tokens in the currently generated audio block. Has no effect if prepend_prompt_speech_tokens is “never”. If False (default), the prompt tokens do not attend to the current block. If True, the prompt tokens attend to the current block.

PipelineConfig

class max.pipelines.lib.config.PipelineConfig(**kwargs)

Configuration for a pipeline.

WIP - Once a PipelineConfig is fully initialized, it should be as immutable as possible (frozen=True). All underlying dataclass fields should have been initialized to their default values, be it user specified via some CLI flag, config file, environment variable, or internally set to a reasonable default.

Parameters:

kwargs (Any )

custom_architectures

custom_architectures*: list[str]*

A list of custom architecture implementations to register. Each input can either be a raw module name or an import path followed by a colon and the module name. Ex:

  • my_module
  • folder/path/to/import:my_module

Each module must expose an ARCHITECTURES list of architectures to register.

draft_model_config

property draft_model_config*: MAXModelConfig | None*

enable_chunked_prefill

enable_chunked_prefill*: bool* = True

Enable chunked prefill to split context encoding requests into multiple chunks based on ‘target_num_new_tokens’.

enable_echo

enable_echo*: bool* = False

Whether the model should be built with echo capabilities.

enable_in_flight_batching

enable_in_flight_batching*: bool* = False

When enabled, prioritizes token generation by batching it with context encoding requests.

engine

engine*: PipelineEngine | None* = None

Engine backend to use for serving, ‘max’ for the max engine, or ‘huggingface’ as fallback option for improved model coverage.

graph_quantization_encoding

property graph_quantization_encoding*: QuantizationEncoding | None*

Converts the CLI encoding to a MAX graph quantization encoding.

Returns:

The graph quantization encoding corresponding to the CLI encoding.

help()

static help()

Documentation for this config class. Return a dictionary of config options and their descriptions.

Return type:

dict[str, str]

ignore_eos

ignore_eos*: bool* = False

Ignore EOS and continue generating tokens, even when an EOS variable is hit.

max_batch_size

max_batch_size*: int | None* = None

Maximum batch size to execute with the model. This is set to one, to minimize memory consumption for the base case, in which a person is running a local server to test out MAX. For users launching in a server scenario, the expectation is that this value should be set higher based on server capacity.

max_ce_batch_size

max_ce_batch_size*: int* = 192

Maximum cache size to reserve for a single context encoding batch. The actual limit is the lesser of this and max_batch_size.

max_length

max_length*: int | None* = None

Maximum sequence length of the model.

max_new_tokens

max_new_tokens*: int* = -1

Maximum number of new tokens to generate during a single inference pass of the model.

max_num_steps

max_num_steps*: int* = -1

The number of steps to run for multi-step scheduling. -1 specifies a default value based on configuration and platform. Ignored for models which are not auto-regressive (e.g. embedding models).

model_config

property model_config*: MAXModelConfig*

pad_to_multiple_of

pad_to_multiple_of*: int* = 2

Pad input tensors to be a multiple of value provided.

pdl_level

pdl_level*: str* = '1'

Level of overlap of kernel launch via programmatic dependent grid control.

pipeline_role

pipeline_role*: PipelineRole* = 'prefill_and_decode'

Whether the pipeline should serve both a prefill or decode role or both.

pool_embeddings

pool_embeddings*: bool* = True

Whether to pool embedding outputs.

profiling_config

property profiling_config*: ProfilingConfig*

resolve()

resolve()

Validates and resolves the config.

This method is called after the config is initialized, to ensure that all config fields have been initialized to a valid state.

Return type:

None

sampling_config

property sampling_config*: SamplingConfig*

target_num_new_tokens

target_num_new_tokens*: int | None* = None

The target number of un-encoded tokens to include in each batch. If not set, this will be set to a best-guess optimal value based on model, hardware, and available memory.

use_experimental_kernels

use_experimental_kernels*: str* = 'false'

PrependPromptSpeechTokens

class max.pipelines.lib.config.PrependPromptSpeechTokens(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

ALWAYS

ALWAYS = 'always'

NEVER

NEVER = 'never'

ONCE

ONCE = 'once'