Skip to main content

Python class

AudioGenerationConfig

AudioGenerationConfig

class max.pipelines.AudioGenerationConfig(audio_decoder, audio_decoder_weights='', chunk_size=None, buffer=0, block_causal=False, prepend_prompt_speech_tokens='never', prepend_prompt_speech_tokens_causal=False, run_model_test_mode=False, prometheus_metrics_mode='instrument_only', *, config_file=None, section_name=None, debug_verify_replay=False, models=<factory>, model_override=<factory>, sampling=<factory>, profiling=<factory>, lora=None, speculative=None, runtime=<factory>, audio_decoder_config=<factory>)

source

Bases: PipelineConfig

Configuration for an audio generation pipeline.

Parameters:

audio_decoder

audio_decoder: str

source

The name of the audio decoder model architecture.

audio_decoder_config

audio_decoder_config: dict[str, Any]

source

Parameters to pass to the audio decoder model.

audio_decoder_weights

audio_decoder_weights: str

source

The path to the audio decoder weights file.

block_causal

block_causal: bool

source

Whether prior buffered tokens attend to tokens in the current block.

buffer

buffer: int

source

The number of previous speech tokens to pass to the audio decoder on each generation step.

chunk_size

chunk_size: list[int] | None

source

The chunk sizes to use for streaming.

from_flags()

classmethod from_flags(audio_flags, **config_flags)

source

Builds an AudioGenerationConfig from audio CLI flags and config kwargs.

Parameters:

Return type:

AudioGenerationConfig

model_config

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init()

model_post_init(context, /)

source

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

  • self (BaseModel) – The BaseModel instance.
  • context (Any) – The context.

Return type:

None

prepend_prompt_speech_tokens

prepend_prompt_speech_tokens: PrependPromptSpeechTokens

source

Whether the prompt speech tokens are forwarded to the audio decoder.

prepend_prompt_speech_tokens_causal

prepend_prompt_speech_tokens_causal: bool

source

Whether the prompt speech tokens attend to tokens in the current audio block.

prometheus_metrics_mode

prometheus_metrics_mode: PrometheusMetricsMode

source

The mode to use for Prometheus metrics.