Skip to main content

Python class

AudioGenerationRequest

AudioGenerationRequest

class max.interfaces.AudioGenerationRequest(request_id, model, input=None, audio_prompt_tokens=<factory>, audio_prompt_transcription='', sampling_params=<factory>, _assistant_message_override=None, prompt=None, streaming=True, buffer_speech_tokens=None)

source

Bases: object

An immutable request for audio generation from a pipeline.

Parameters:

audio_prompt_tokens

audio_prompt_tokens: list[int]

source

The prompt speech IDs to use for audio generation.

audio_prompt_transcription

audio_prompt_transcription: str = ''

source

The audio prompt transcription to use for audio generation.

buffer_speech_tokens

buffer_speech_tokens: ndarray[tuple[Any, ...], dtype[integer[Any]]] | None = None

source

An optional field potentially containing the last N speech tokens generated by the model from a previous request.

When this field is specified, this tensor is used to buffer the tokens sent to the audio decoder.

input

input: str | None = None

source

The text to generate audio for. The maximum length is 4096 characters.

model

model: str

source

The name of the model to be used for generating audio chunks. This should match the available models on the server and determines the behavior and capabilities of the response generation.

prompt

prompt: list[int] | str | None = None

source

Optionally provide a preprocessed list of token ids or a prompt string to pass as input directly into the model. This replaces automatically generating TokenGeneratorRequestMessages given the input, audio prompt tokens, audio prompt transcription fields.

request_id

request_id: RequestID

source

A unique identifier for the request.

sampling_params

sampling_params: SamplingParams

source

Request sampling configuration options.

streaming

streaming: bool = True

source

Whether to stream the audio generation.