Skip to main content

Python class

SpeechTokenGenerationPipeline

SpeechTokenGenerationPipeline

final class max.pipelines.SpeechTokenGenerationPipeline(pipeline_config, pipeline_model, eos_token_id, weight_adapters, tokenizer)

source

Bases: TextGenerationPipeline[TTSContext]

A text-to-speech token generation pipeline for TTS models.

Initialize a text generation pipeline instance.

This sets up devices, the inference session, tokenizer, KV-cache manager, sampling kernel, and loads model weights and adapters.

Parameters:

  • pipeline_config (PipelineConfig) – Configuration for the pipeline and runtime behavior.
  • pipeline_model (type[PipelineModel[TTSContext]]) – Concrete model implementation to use for execution.
  • eos_token_id (int) – Default EOS token id used when HF config does not supply one or to seed the EOS set.
  • weight_adapters (dict[WeightsFormat, WeightsAdapter]) – Mapping from weights format to adapter implementation.
  • tokenizer (PipelineTokenizer[Any, Any, Any]) – Tokenizer implementation used to build contexts and decode.

Raises:

ValueError – If quantization_encoding is not configured in pipeline_config.model or if structured output is requested without a valid tokenizer delegate.

next_speech_token()

next_speech_token(batch, num_steps, tokens_to_generate)

source

Processes the batch and returns decoded tokens.

Given a batch, executes the graph for num_steps in a multi-step scenario, then decodes the tokens holistically and returns the list of decoded tokens.

Parameters:

Return type:

dict[RequestID, TextGenerationOutput]