Python class
TextGenerationContext
TextGenerationContext
class max.interfaces.TextGenerationContext(*args, **kwargs)
Bases: BaseContext, Protocol
Protocol defining the interface for text generation contexts in token generation.
A TextGenerationContext represents model inputs for text generation pipelines, managing
the state of tokens throughout the generation process. It handles token arrays,
generation status, sampling parameters, and various indices that track different
stages of token processing.
advance_fsm()
advance_fsm(token)
Advance the FSM matcher state by one token.
This method advances only the FSM state for constrained decoding.
It does NOT modify the token buffer. Use advance_token_buffer()
separately if token buffer advancement is needed, or use update()
for the common case of advancing both together.
advance_token_buffer()
advance_token_buffer(new_token, log_probabilities=None, mark_previous_as_processed=True)
Advance the token buffer without touching FSM state.
This method handles token buffer mutations including log probability storage, token buffer advancement, and EOS/max-length status updates. It does NOT advance the FSM matcher.
Use advance_fsm() separately if FSM advancement is needed, or use
update() for the common case of advancing both together.
-
Parameters:
-
- new_token (int) – The token to append to the buffer.
- log_probabilities (LogProbabilities | None) – Optional log probabilities for this token.
- mark_previous_as_processed (bool) – If True, mark previous tokens as processed. If False, keep them unprocessed so they’re returned to the user (used for jump-ahead tokens).
-
Return type:
-
None
cached_prefix_length
Prompt tokens served from the KV prefix cache on first admission.
Set by the block manager when a request is admitted to a CE batch (0
if the cache had no matching prefix). BatchMetrics.create consumes
the value to emit a per-request cache hit rate observation, then
resets it to None so chunked-prefill follow-up calls do not
re-emit.
compute_num_available_steps()
compute_num_available_steps(max_seq_len)
Computes the maximum number of generation steps available.
This method calculates how many tokens can be generated without exceeding the specified maximum sequence length limit.
eos_tracker
property eos_tracker: EOSTracker
Holds EOS-related settings for this sequence and performs EOS/stop checks.
-
Returns:
-
The
EOSTrackerfor this sequence.
get_min_token_logit_mask()
get_min_token_logit_mask(num_steps)
Returns the token indices that should be masked in the output logits.
This method is primarily used to implement the min_tokens constraint,
where certain tokens (typically EOS tokens) are masked to prevent early
termination before the minimum token count is reached.
-
Parameters:
-
num_steps (int) – The number of generation steps to compute masks for.
-
Returns:
-
A list of NumPy arrays, where each array contains token indices that should be masked (set to negative infinity) in the logits for the corresponding generation step.
-
Return type:
is_initial_prompt
property is_initial_prompt: bool
Whether this context contains only the initial prompt.
This property indicates if the context has not yet been updated with any generated tokens and still contains only the original input.
-
Returns:
-
Trueif no tokens have been generated yet,Falseif generation has begun and tokens have been added.
json_schema
The JSON schema for constrained decoding, if configured.
When set, this schema constrains token generation to produce valid JSON output that conforms to the specified structure.
-
Returns:
-
The JSON schema string, or
Noneif no schema constraint is active.
jump_ahead()
jump_ahead(new_token)
Jump ahead in generation by adding a token and updating indices.
This method is used in speculative decoding scenarios to quickly advance the generation state when draft tokens are accepted.
-
Parameters:
-
new_token (int) – The token ID to add when jumping ahead in the sequence.
-
Return type:
-
None
log_probabilities
property log_probabilities: int
The number of top tokens to return log probabilities for.
When greater than 0, the system returns log probabilities for the top N most likely tokens at each generation step.
-
Returns:
-
The number of top tokens to include in log probability output. Returns 0 if log probabilities are disabled.
log_probabilities_echo
property log_probabilities_echo: bool
Whether to include input tokens in the returned log probabilities.
When True, log probabilities will be computed and returned for input
(prompt) tokens in addition to generated tokens.
-
Returns:
-
Trueif input tokens should be included in log probability output,Falseotherwise.
matcher
The grammar matcher for structured output generation, if configured.
The matcher enforces structural constraints (like JSON schema) during generation to ensure valid formatted output.
-
Returns:
-
The grammar matcher instance, or
Noneif no structured generation is configured for this context.
max_length
The maximum allowed length for this sequence.
When set, generation will stop when this length is reached, regardless of other stopping criteria.
-
Returns:
-
The maximum sequence length limit, or
Noneif no limit is set.
min_tokens
property min_tokens: int
The minimum number of new tokens that must be generated.
Generation will continue until at least this many new tokens have been produced, even if other stopping criteria are met (for example, EOS tokens).
-
Returns:
-
The minimum number of new tokens to generate.
realize_future_token()
realize_future_token(new_token, log_probabilities=None)
Overwrite the placeholder future token with the actual token.
This is primarily used for overlap scheduling.
-
Parameters:
-
- new_token (int)
- log_probabilities (LogProbabilities | None)
-
Return type:
-
None
reset()
reset()
Resets the context’s state by combining all tokens into a new prompt.
This method is used when a request is evicted, meaning that the context needed to be re-encoded in the following CE iteration.
-
Return type:
-
None
sampling_params
property sampling_params: SamplingParams
The sampling parameters configured for this generation request.
These parameters control how tokens are selected during generation, including temperature, top-k/top-p filtering, and stopping criteria.
-
Returns:
-
The
SamplingParamsinstance containing all sampling configuration for this context.
set_matcher()
set_matcher(matcher)
Set a grammar matcher for constrained decoding.
This method configures structured output generation by installing a grammar matcher that enforces format constraints during token generation.
-
Parameters:
-
matcher (Any) – The grammar matcher instance to use for constraining output. The specific type depends on the structured generation backend.
-
Return type:
-
None
spec_decoding_state
property spec_decoding_state: SpecDecodingState
Returns the speculative decoding state.
to_generation_output()
to_generation_output()
Converts this context to a TextGenerationOutput object.
Provides a standardized way to extract the final output of the text generation process from the context, including generated text, tokens, and any associated metadata.
-
Returns:
-
The output object containing the results of the text generation for this context.
-
Return type:
tokens
property tokens: TokenBuffer
The token buffer for the context.
update()
update(new_token, log_probabilities=None)
Advance both token buffer and FSM state.
This is the standard single-step update that most callers should use.
It combines advance_token_buffer() and advance_fsm() for the
common case where both need to be advanced together.
For multi-step execution where FSM is advanced separately (e.g., to compute bitmasks between steps), use the individual methods directly.
-
Parameters:
-
- new_token (int) – The token ID to add to the generation sequence.
- log_probabilities (LogProbabilities | None) – Optional log probability data for the new token and alternatives. Used for analysis and debugging.
-
Return type:
-
None
update_with_future_token()
update_with_future_token()
Append a placeholder future token to the generated tokens.
This is primarily used for overlap scheduling.
-
Return type:
-
None
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!