Python class

SpeechTokenGenerationPipeline

`SpeechTokenGenerationPipeline`

final class max.pipelines.SpeechTokenGenerationPipeline(pipeline_config, pipeline_model, eos_token_id, weight_adapters, tokenizer)

source

Bases: TextGenerationPipeline[TTSContext]

A text-to-speech token generation pipeline for TTS models.

Initialize a text generation pipeline instance.

This sets up devices, the inference session, tokenizer, KV-cache manager, sampling kernel, and loads model weights and adapters.

Parameters:

pipeline_config (PipelineConfig) – Configuration for the pipeline and runtime behavior.
pipeline_model (type[PipelineModel[TTSContext]]) – Concrete model implementation to use for execution.
eos_token_id (int) – Default EOS token id used when HF config does not supply one or to seed the EOS set.
weight_adapters (dict[WeightsFormat, WeightsAdapter]) – Mapping from weights format to adapter implementation.
tokenizer (PipelineTokenizer[Any, Any, Any]) – Tokenizer implementation used to build contexts and decode.

Raises:

ValueError – If quantization_encoding is not configured in pipeline_config.model or if structured output is requested without a valid tokenizer delegate.

`next_speech_token()`

next_speech_token(batch, num_steps, tokens_to_generate)

source

Processes the batch and returns decoded tokens.

Given a batch, executes the graph for num_steps in a multi-step scenario, then decodes the tokens holistically and returns the list of decoded tokens.

Parameters:

batch (dict[RequestID, TTSContext])
num_steps (int)
tokens_to_generate (dict[RequestID, int])

Return type:

dict[RequestID, TextGenerationOutput]

SpeechTokenGenerationPipeline​

next_speech_token()​

`SpeechTokenGenerationPipeline`

`next_speech_token()`