Python class

AudioGenerationConfig

`AudioGenerationConfig`

class max.pipelines.AudioGenerationConfig(audio_decoder, audio_decoder_weights='', chunk_size=None, buffer=0, block_causal=False, prepend_prompt_speech_tokens='never', prepend_prompt_speech_tokens_causal=False, run_model_test_mode=False, prometheus_metrics_mode='instrument_only', *, config_file=None, section_name=None, debug_verify_replay=False, models=<factory>, model_override=<factory>, sampling=<factory>, profiling=<factory>, lora=None, speculative=None, runtime=<factory>, audio_decoder_config=<factory>)

source

Bases: PipelineConfig

Configuration for an audio generation pipeline.

Parameters:

audio_decoder (str)
audio_decoder_weights (str)
chunk_size (list[int] | None)
buffer (int)
block_causal (bool)
prepend_prompt_speech_tokens (Literal['never', 'once', 'rolling'])
prepend_prompt_speech_tokens_causal (bool)
run_model_test_mode (bool)
prometheus_metrics_mode (Literal['instrument_only', 'launch_server', 'launch_multiproc_server'])
config_file (str | None)
section_name (str | None)
debug_verify_replay (bool)
models (dict[str, MAXModelConfig])
model_override (list[str])
sampling (SamplingConfig)
profiling (ProfilingConfig)
lora (LoRAConfig | None)
speculative (SpeculativeConfig | None)
runtime (PipelineRuntimeConfig)
audio_decoder_config (dict[str, Any])

`audio_decoder`

audio_decoder: str

source

The name of the audio decoder model architecture.

`audio_decoder_config`

audio_decoder_config: dict[str, Any]

source

Parameters to pass to the audio decoder model.

`audio_decoder_weights`

audio_decoder_weights: str

source

The path to the audio decoder weights file.

`block_causal`

block_causal: bool

source

Whether prior buffered tokens attend to tokens in the current block.

`buffer`

buffer: int

source

The number of previous speech tokens to pass to the audio decoder on each generation step.

`chunk_size`

chunk_size: list[int] | None

source

The chunk sizes to use for streaming.

`from_flags()`

classmethod from_flags(audio_flags, **config_flags)

source

Builds an AudioGenerationConfig from audio CLI flags and config kwargs.

Parameters:

audio_flags (dict[str, str])
config_flags (Any)

Return type:

AudioGenerationConfig

`model_config`

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

`model_post_init()`

model_post_init(context, /)

source

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self (BaseModel) – The BaseModel instance.
context (Any) – The context.

Return type:

None

`prepend_prompt_speech_tokens`

prepend_prompt_speech_tokens: PrependPromptSpeechTokens

source

Whether the prompt speech tokens are forwarded to the audio decoder.

`prepend_prompt_speech_tokens_causal`

prepend_prompt_speech_tokens_causal: bool

source

Whether the prompt speech tokens attend to tokens in the current audio block.

`prometheus_metrics_mode`

prometheus_metrics_mode: PrometheusMetricsMode

source

The mode to use for Prometheus metrics.

AudioGenerationConfig​

audio_decoder​

audio_decoder_config​

audio_decoder_weights​

block_causal​

buffer​

chunk_size​

from_flags()​

model_config​

model_post_init()​

prepend_prompt_speech_tokens​

prepend_prompt_speech_tokens_causal​

prometheus_metrics_mode​

`AudioGenerationConfig`

`audio_decoder`

`audio_decoder_config`

`audio_decoder_weights`

`block_causal`

`buffer`

`chunk_size`

`from_flags()`

`model_config`

`model_post_init()`

`prepend_prompt_speech_tokens`

`prepend_prompt_speech_tokens_causal`

`prometheus_metrics_mode`