Skip to main content

Python class

AudioGenerationRequest

AudioGenerationRequest​

class max.interfaces.AudioGenerationRequest(request_id, model, input=None, audio_prompt_tokens=<factory>, audio_prompt_transcription='', sampling_params=<factory>, _assistant_message_override=None, prompt=None, streaming=True, buffer_speech_tokens=None)

source

Bases: object

An immutable request for audio generation from a pipeline.

Parameters:

audio_prompt_tokens​

audio_prompt_tokens: list[int]

source

The prompt speech IDs to use for audio generation.

audio_prompt_transcription​

audio_prompt_transcription: str = ''

source

The audio prompt transcription to use for audio generation.

buffer_speech_tokens​

buffer_speech_tokens: ndarray[tuple[Any, ...], dtype[integer[Any]]] | None = None

source

An optional field potentially containing the last N speech tokens generated by the model from a previous request.

When this field is specified, this tensor is used to buffer the tokens sent to the audio decoder.

input​

input: str | None = None

source

The text to generate audio for. The maximum length is 4096 characters.

model​

model: str

source

The name of the model to be used for generating audio chunks. This should match the available models on the server and determines the behavior and capabilities of the response generation.

prompt​

prompt: list[int] | str | None = None

source

Optionally provide a preprocessed list of token ids or a prompt string to pass as input directly into the model. This replaces automatically generating TokenGeneratorRequestMessages given the input, audio prompt tokens, audio prompt transcription fields.

request_id​

request_id: RequestID

source

A unique identifier for the request.

sampling_params​

sampling_params: SamplingParams

source

Request sampling configuration options.

streaming​

streaming: bool = True

source

Whether to stream the audio generation.