Python class
TTSContext
TTSContext
class max.pipelines.TTSContext(*, max_length, tokens, request_id=<factory>, eos_tracker=<factory>, log_probabilities=0, log_probabilities_echo=False, ignore_eos=False, json_schema=None, sampling_params=<factory>, model_name='', _matcher=None, status=GenerationStatus.ACTIVE, _log_probabilities_data=<factory>, _is_initial_prompt=True, _draft_offset=0, _spec_decoding_state=None, target_endpoint=None, external_block_metadata=None, cached_prefix_length=None, audio_prompt_tokens=<factory>, buffer_speech_tokens=None, audio_buffer=None, prev_samples_beyond_offset=0, streaming=False, _speech_token_size=128, _speech_token_end_idx=0, _speech_tokens=<factory>, decoded_index=0, _block_counter=0, _arrival_time=<factory>, audio_generation_status=GenerationStatus.ACTIVE)
Bases: TextContext
A context for Text-to-Speech (TTS) model inference.
This class extends TextContext to handle speech token generation and management. It maintains buffers for audio prompt tokens and generated speech tokens, along with tracking indices for decoding progress.
-
Parameters:
-
- max_length (int)
- tokens (TokenBuffer)
- request_id (RequestID)
- eos_tracker (EOSTracker)
- log_probabilities (int)
- log_probabilities_echo (bool)
- ignore_eos (bool)
- json_schema (str | None)
- sampling_params (SamplingParams)
- model_name (str)
- _matcher (Any | None)
- status (GenerationStatus)
- _log_probabilities_data (dict[int, LogProbabilities])
- _is_initial_prompt (bool)
- _draft_offset (int)
- _spec_decoding_state (SpecDecodingState | None)
- target_endpoint (str | None)
- external_block_metadata (Any)
- cached_prefix_length (int | None)
- audio_prompt_tokens (ndarray[tuple[Any, ...], dtype[integer[Any]]]) – Array of input audio prompt tokens used for voice cloning
- buffer_speech_tokens (ndarray[tuple[Any, ...], dtype[integer[Any]]] | None)
- audio_buffer (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
- prev_samples_beyond_offset (int)
- streaming (bool) – Whether the request is streaming the audio to client
- _speech_token_size (int) – Size of the speech token buffer, defaults to SPEECH_TOKEN_audio_chunk_size
- _speech_token_end_idx (int) – Index marking the end of valid speech tokens
- _speech_tokens (ndarray[tuple[Any, ...], dtype[integer[Any]]]) – Buffer containing the generated speech tokens
- decoded_index (int)
- _block_counter (int) – Counter tracking number of speech token blocks generated
- _arrival_time (float)
- audio_generation_status (GenerationStatus)
audio_buffer
audio_buffer: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
audio_generation_status
audio_generation_status: GenerationStatus = 'active'
audio_prompt_tokens
audio_prompt_tokens: ndarray[tuple[Any, ...], dtype[integer[Any]]]
block_counter
property block_counter: int
The number of speech token blocks generated.
buffer_speech_tokens
buffer_speech_tokens: ndarray[tuple[Any, ...], dtype[integer[Any]]] | None = None
decoded_index
decoded_index: int = 0
is_done
property is_done: bool
Whether audio generation has finished.
next_speech_tokens()
next_speech_tokens(audio_chunk_size=None, buffer=None)
Returns a chunk of the next unseen speech tokens.
Calling this function will not update the index of the last seen token. This must be done by setting decoded_index after the chunk is processed.
-
Parameters:
-
Returns:
-
A tuple of (chunk of speech tokens, buffer).
-
Return type:
prev_samples_beyond_offset
prev_samples_beyond_offset: int = 0
speech_tokens
property speech_tokens: ndarray[tuple[Any, ...], dtype[integer[Any]]]
The slice of generated speech tokens valid so far.
streaming
streaming: bool = False
update_speech_tokens()
update_speech_tokens(new_tokens)
Updates the buffer with new speech tokens.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!