Python class
TTSContext
TTSContextβ
class max.pipelines.TTSContext(*, max_length, tokens, request_id=<factory>, eos_tracker=<factory>, log_probabilities=0, log_probabilities_echo=False, ignore_eos=False, json_schema=None, sampling_params=<factory>, model_name='', _matcher=None, status=GenerationStatus.ACTIVE, _log_probabilities_data=<factory>, _is_initial_prompt=True, _draft_offset=0, _spec_decoding_state=None, in_reasoning_phase=False, target_endpoint=None, external_block_metadata=None, cached_prefix_length=None, _cache_metrics_emitted=False, audio_prompt_tokens=<factory>, buffer_speech_tokens=None, audio_buffer=None, prev_samples_beyond_offset=0, streaming=False, _speech_token_size=128, _speech_token_end_idx=0, _speech_tokens=<factory>, decoded_index=0, _block_counter=0, _arrival_time=<factory>, audio_generation_status=GenerationStatus.ACTIVE)
Bases: TextContext
A context for Text-to-Speech (TTS) model inference.
This class extends TextContext to handle speech token generation and management. It maintains buffers for audio prompt tokens and generated speech tokens, along with tracking indices for decoding progress.
-
Parameters:
-
- max_length (int)
- tokens (TokenBuffer)
- request_id (RequestID)
- eos_tracker (EOSTracker)
- log_probabilities (int)
- log_probabilities_echo (bool)
- ignore_eos (bool)
- json_schema (str | None)
- sampling_params (SamplingParams)
- model_name (str)
- _matcher (Any | None)
- status (GenerationStatus)
- _log_probabilities_data (dict[int, LogProbabilities])
- _is_initial_prompt (bool)
- _draft_offset (int)
- _spec_decoding_state (SpecDecodingState | None)
- in_reasoning_phase (bool)
- target_endpoint (str | None)
- external_block_metadata (Any)
- cached_prefix_length (int | None)
- _cache_metrics_emitted (bool)
- audio_prompt_tokens (ndarray[tuple[Any, ...], dtype[integer[Any]]]) β Array of input audio prompt tokens used for voice cloning
- buffer_speech_tokens (ndarray[tuple[Any, ...], dtype[integer[Any]]] | None)
- audio_buffer (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
- prev_samples_beyond_offset (int)
- streaming (bool) β Whether the request is streaming the audio to client
- _speech_token_size (int) β Size of the speech token buffer, defaults to SPEECH_TOKEN_audio_chunk_size
- _speech_token_end_idx (int) β Index marking the end of valid speech tokens
- _speech_tokens (ndarray[tuple[Any, ...], dtype[integer[Any]]]) β Buffer containing the generated speech tokens
- decoded_index (int)
- _block_counter (int) β Counter tracking number of speech token blocks generated
- _arrival_time (float)
- audio_generation_status (GenerationStatus)
audio_bufferβ
audio_buffer: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
audio_generation_statusβ
audio_generation_status: GenerationStatus = 'active'
audio_prompt_tokensβ
audio_prompt_tokens: ndarray[tuple[Any, ...], dtype[integer[Any]]]
block_counterβ
property block_counter: int
The number of speech token blocks generated.
buffer_speech_tokensβ
buffer_speech_tokens: ndarray[tuple[Any, ...], dtype[integer[Any]]] | None = None
decoded_indexβ
decoded_index: int = 0
is_doneβ
property is_done: bool
Whether audio generation has finished.
next_speech_tokens()β
next_speech_tokens(audio_chunk_size=None, buffer=None)
Returns a chunk of the next unseen speech tokens.
Calling this function will not update the index of the last seen token. This must be done by setting decoded_index after the chunk is processed.
-
Parameters:
-
Returns:
-
A tuple of (chunk of speech tokens, buffer).
-
Return type:
prev_samples_beyond_offsetβ
prev_samples_beyond_offset: int = 0
speech_tokensβ
property speech_tokens: ndarray[tuple[Any, ...], dtype[integer[Any]]]
The slice of generated speech tokens valid so far.
streamingβ
streaming: bool = False
update_speech_tokens()β
update_speech_tokens(new_tokens)
Updates the buffer with new speech tokens.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!