Skip to main content

Python module

core

TTSContext

class max.pipelines.core.TTSContext(audio_prompt_tokens=<factory>, buffer_speech_tokens=None, audio_buffer=None, prev_samples_beyond_offset=0, streaming=False, _speech_token_size=128, _speech_token_end_idx=0, _speech_tokens=<factory>, _decoded_index=0, _block_counter=0, _arrival_time=<factory>, _audio_generation_status=GenerationStatus.ACTIVE, *, request_id=<factory>, max_length, tokens, eos_token_ids=<factory>, eos_sequences=<factory>, log_probabilities=0, log_probabilities_echo=False, ignore_eos=False, json_schema=None, sampling_params=<factory>, model_name='', _matcher=None, _status=GenerationStatus.ACTIVE, _size=-1, _start_idx=0, _active_idx=-1, _end_idx=-1, _completion_start_idx=-1, _completion_end_idx=-1, _prompt_len=-1, _committed_idx=0, _log_probabilities_data=<factory>, _is_initial_prompt=True, _draft_offset=0)

A context for Text-to-Speech (TTS) model inference.

This class extends TextContext to handle speech token generation and management. It maintains buffers for audio prompt tokens and generated speech tokens, along with tracking indices for decoding progress.

Parameters:

  • audio_prompt_tokens (ndarray) – Array of input audio prompt tokens used for voice cloning
  • buffer_speech_tokens (ndarray | None)
  • audio_buffer (ndarray | None)
  • prev_samples_beyond_offset (int)
  • streaming (bool) – Whether the request is streaming the audio to client
  • _speech_token_size (int) – Size of the speech token buffer, defaults to SPEECH_TOKEN_audio_chunk_size
  • _speech_token_end_idx (int) – Index marking the end of valid speech tokens
  • _speech_tokens (ndarray) – Buffer containing the generated speech tokens
  • _decoded_index (int) – Index tracking how many tokens have been decoded to audio
  • _block_counter (int) – Counter tracking number of speech token blocks generated
  • _arrival_time (float)
  • _audio_generation_status (GenerationStatus)
  • request_id (str)
  • max_length (int)
  • tokens (ndarray)
  • eos_token_ids (set[int])
  • eos_sequences (list[list[int]])
  • log_probabilities (int)
  • log_probabilities_echo (bool)
  • ignore_eos (bool)
  • json_schema (str | None)
  • sampling_params (SamplingParams)
  • model_name (str)
  • _matcher (Any | None)
  • _status (GenerationStatus)
  • _size (int)
  • _start_idx (int)
  • _active_idx (int)
  • _end_idx (int)
  • _completion_start_idx (int)
  • _completion_end_idx (int)
  • _prompt_len (int)
  • _committed_idx (int)
  • _log_probabilities_data (dict[int, LogProbabilities])
  • _is_initial_prompt (bool)
  • _draft_offset (int)

audio_buffer

audio_buffer: ndarray | None

audio_generation_status

property audio_generation_status: GenerationStatus

audio_prompt_tokens

audio_prompt_tokens: ndarray

block_counter

property block_counter: int

buffer_speech_tokens

buffer_speech_tokens: ndarray | None

decoded_index

property decoded_index: int

has_undecoded_speech_tokens()

has_undecoded_speech_tokens(exclude_last_n=0)

Checks whether there are undecoded speech tokens.

Parameters:

exclude_last_n (int) – Number of tokens to exclude from the end when checking for undecoded tokens. For example, if set to 1, the last token will not be considered when checking for undecoded tokens.

Returns:

True if there are undecoded speech tokens (excluding the last n tokens), False otherwise.

Return type:

bool

is_done

property is_done: bool

next_speech_tokens()

next_speech_tokens(audio_chunk_size=None, buffer=None)

Returns a chunk of the next unseen speech tokens.

Calling this function will not update the index of the last seen token. This must be done by calling set_decoded_index after the chunk is processed.

Parameters:

  • audio_chunk_size (int | None) – The number of speech tokens to return.
  • buffer (int | None) – The number of previous speech tokens to pass to the audio decoder on each generation step.

Returns:

A tuple of (chunk of speech tokens, buffer).

Return type:

tuple[ndarray, int]

prev_samples_beyond_offset

prev_samples_beyond_offset: int

set_decoded_index()

set_decoded_index(idx)

Parameters:

idx (int)

Return type:

None

speech_token_status

property speech_token_status: GenerationStatus

Returns the status of the speech token generation.

speech_tokens

property speech_tokens: ndarray

status

property status: GenerationStatus

streaming

streaming: bool

update_audio_generation_status()

update_audio_generation_status(status)

Parameters:

status (GenerationStatus)

Return type:

None

update_speech_token_status()

update_speech_token_status(status)

Parameters:

status (GenerationStatus)

Return type:

None

update_speech_tokens()

update_speech_tokens(new_tokens)

Updates the next_tokens

Parameters:

new_tokens (ndarray)

Return type:

None

update_status()

update_status(status)

Parameters:

status (GenerationStatus)

Return type:

None

TextAndVisionContext

class max.pipelines.core.TextAndVisionContext(*, request_id=<factory>, max_length, tokens, eos_token_ids=<factory>, eos_sequences=<factory>, log_probabilities=0, log_probabilities_echo=False, ignore_eos=False, json_schema=None, sampling_params=<factory>, model_name='', _matcher=None, _status=GenerationStatus.ACTIVE, _size=-1, _start_idx=0, _active_idx=-1, _end_idx=-1, _completion_start_idx=-1, _completion_end_idx=-1, _prompt_len=-1, _committed_idx=0, _log_probabilities_data=<factory>, _is_initial_prompt=True, _draft_offset=0, pixel_values=(), extra_model_args=<factory>, _needs_vision_encoding=True)

A base class for model context, specifically for Vision model variants.

Parameters:

extra_model_args

extra_model_args: dict[str, ndarray]

needs_vision_encoding

property needs_vision_encoding: bool

Gets whether vision encoding is needed for this context.

pixel_values

pixel_values: tuple[ndarray, ...]

reset()

reset()

Resets the context’s state by combining all tokens into a new prompt.

Return type:

None

update()

update(new_token, log_probabilities=None)

Updates the next_tokens and extends existing tokens to include all generated tokens.

Parameters:

Return type:

None

TextContext

class max.pipelines.core.TextContext(*, request_id=<factory>, max_length, tokens, eos_token_ids=<factory>, eos_sequences=<factory>, log_probabilities=0, log_probabilities_echo=False, ignore_eos=False, json_schema=None, sampling_params=<factory>, model_name='', _matcher=None, _status=GenerationStatus.ACTIVE, _size=-1, _start_idx=0, _active_idx=-1, _end_idx=-1, _completion_start_idx=-1, _completion_end_idx=-1, _prompt_len=-1, _committed_idx=0, _log_probabilities_data=<factory>, _is_initial_prompt=True, _draft_offset=0)

A base class for model context, specifically for Text model variants.

This class manages the state and processing of text generation, including token management, caching, and generation parameters.

Parameters:

  • request_id (str) – A unique identifier for this sequence.
  • max_length (int) – Maximum allowed length of the generated sequence
  • tokens (ndarray) – NumPy array containing the token IDs
  • eos_token_ids (set[int]) – Set of token IDs that indicate end of sequence
  • eos_sequences (list[list[int]])
  • log_probabilities (int) – Whether to return token log probabilities
  • log_probabilities_echo (bool) – Whether to return log probabilities for prompt tokens
  • ignore_eos (bool) – Whether to ignore end of sequence tokens and continue generating
  • json_schema (str | None) – Optional JSON schema for structured output
  • sampling_params (SamplingParams) – Parameters controlling the token sampling strategy
  • model_name (str)
  • _matcher (Any | None)
  • _status (GenerationStatus) – Current generation status (active, finished, etc)
  • _size (int) – Current allocated size of token array
  • _start_idx (int) – Start index of current generation window
  • _active_idx (int) – Current position in token sequence
  • _end_idx (int) – End index of valid tokens
  • _completion_start_idx (int) – Start index of completion tokens
  • _completion_end_idx (int) – End index of completion tokens
  • _prompt_len (int) – Length of original prompt
  • _committed_idx (int) – Index up to which tokens are committed
  • _log_probabilities_data (dict[int, LogProbabilities]) – Token log probabilities data
  • _is_initial_prompt (bool) – Whether this is the initial prompt encoding
  • _draft_offset (int) – Offset for draft decoding

active_idx

property active_idx: int

active_length

property active_length: int

num tokens input this iteration.

This will be the prompt size for context encoding, and simply 1 (or more) for token generation.

Type:

Current sequence length

all_tokens

property all_tokens: ndarray

bump_token_indices()

bump_token_indices(start_idx=0, active_idx=0, end_idx=0, committed_idx=0)

Update the start_idx, active_idx and end_idx without manipulating the token array.

Parameters:

  • start_idx (int)
  • active_idx (int)
  • end_idx (int)
  • committed_idx (int)

Return type:

None

committed_idx

property committed_idx: int

compute_num_available_steps()

compute_num_available_steps(max_seq_len)

Compute the max number of steps we can execute for a given context without exceeding the max_seq_len.

Parameters:

max_seq_len (int)

Return type:

int

current_length

property current_length: int

The current length of the sequence, including completed and active tokens.

end_idx

property end_idx: int

eos_sequences

eos_sequences: list[list[int]]

eos_token_ids

eos_token_ids: set[int]

generated_tokens

property generated_tokens: ndarray

Returns all tokens that have been generated after the prompt.

Returns:

Array of generated tokens from prompt_len to end_idx.

Return type:

np.ndarray

get_min_token_logit_mask()

get_min_token_logit_mask(num_steps)

Returns a set of indices for the tokens in the output that should be masked.

This is primarily used for the min_tokens setting, where we mask eos tokens in the logits to avoid generating them before we reach min_tokens.

Returns:

A set of indices for the tokens in the output that should be masked.

Parameters:

num_steps (int)

Return type:

list[ndarray[Any, dtype[int32]]]

ignore_eos

ignore_eos: bool

is_ce

property is_ce: bool

Returns whether this context is in context encoding (CE) mode.

CE mode indicates that the context has more than one active token to process, typically during the initial encoding of a prompt or after a rollback.

Returns:

True if in CE mode (active_length > 1), False otherwise.

Return type:

bool

is_done

property is_done: bool

is_initial_prompt

property is_initial_prompt: bool

Returns true if the context has not been updated with tokens.

json_schema

json_schema: str | None

jump_ahead()

jump_ahead(new_token)

Updates the token array, while ensuring the new token is returned to the user.

Parameters:

new_token (int)

Return type:

None

log_probabilities

log_probabilities: int

log_probabilities_echo

log_probabilities_echo: bool

matcher

property matcher: llguidance.LLMatcher | None

max_length

max_length: int

min_tokens

property min_tokens: int

The minimum number of new tokens to generate.

model_name

model_name: str

next_tokens

property next_tokens: ndarray

Returns the tokens between start_idx and active_idx.

Returns:

Array of tokens that have been generated but not yet processed.

Return type:

np.ndarray

outstanding_completion_tokens()

outstanding_completion_tokens()

Return the list of outstanding completion tokens and log probabilities that must be returned to the user.

Return type:

list[tuple[int, LogProbabilities | None]]

prompt_tokens

property prompt_tokens: ndarray

Returns the original prompt tokens.

Returns:

Array of tokens from the initial prompt.

Return type:

np.ndarray

request_id

request_id: str

reset()

reset()

Resets the context’s state by combining all tokens into a new prompt.

Return type:

None

rollback()

rollback(idx)

Parameters:

idx (int)

Return type:

None

sampling_params

sampling_params: SamplingParams

set_draft_offset()

set_draft_offset(idx)

Sets the draft offset index used for speculative decoding.

Parameters:

idx (int) – The index to set as the draft offset.

Return type:

None

set_matcher()

set_matcher(matcher)

Parameters:

matcher (llguidance.LLMatcher)

Return type:

None

set_token_indices()

set_token_indices(start_idx=None, active_idx=None, end_idx=None, committed_idx=None)

Set the token indices without manipulating the token array.

Parameters:

  • start_idx (int | None)
  • active_idx (int | None)
  • end_idx (int | None)
  • committed_idx (int | None)

Return type:

None

start_idx

property start_idx: int

status

property status: GenerationStatus

tokens

tokens: ndarray

update()

update(new_token, log_probabilities=None)

Updates the next_tokens and extends existing tokens to include all generated tokens.

Parameters:

Return type:

None

update_status()

update_status(status)

Parameters:

status (GenerationStatus)

Return type:

None

Was this page helpful?