Python class
TextContext
TextContextβ
class max.pipelines.TextContext(*, max_length, tokens, request_id=<factory>, eos_tracker=<factory>, log_probabilities=0, log_probabilities_echo=False, ignore_eos=False, json_schema=None, sampling_params=<factory>, model_name='', _matcher=None, status=GenerationStatus.ACTIVE, _log_probabilities_data=<factory>, _is_initial_prompt=True, _draft_offset=0, _spec_decoding_state=None, in_reasoning_phase=False, target_endpoint=None, external_block_metadata=None, cached_prefix_length=None, _cache_metrics_emitted=False)
Bases: object
A base class for model context, specifically for Text model variants.
This class manages the state and processing of text generation, including token management, caching, and generation parameters.
-
Parameters:
-
- max_length (int) β Maximum allowed length of the generated sequence
- tokens (TokenBuffer) β NumPy array containing the token IDs
- request_id (RequestID) β A unique identifier for this sequence.
- eos_tracker (EOSTracker) β holds EOS config and performs checks for EOS conditions
- log_probabilities (int) β Whether to return token log probabilities
- log_probabilities_echo (bool) β Whether to return log probabilities for prompt tokens
- ignore_eos (bool) β Whether to ignore end of sequence tokens and continue generating
- json_schema (str | None) β Optional JSON schema for structured output
- sampling_params (SamplingParams) β Parameters controlling the token sampling strategy
- model_name (str)
- _matcher (Any | None)
- status (GenerationStatus)
- _log_probabilities_data (dict[int, LogProbabilities]) β Token log probabilities data
- _is_initial_prompt (bool) β Whether this is the initial prompt encoding
- _draft_offset (int) β Offset for draft decoding
- _spec_decoding_state (SpecDecodingState | None) β Optional per-request speculative decoding state
- in_reasoning_phase (bool)
- target_endpoint (str | None) β Optional target endpoint identifier for routing requests
- external_block_metadata (Any)
- cached_prefix_length (int | None)
- _cache_metrics_emitted (bool)
advance_fsm()β
advance_fsm(token)
Advance the FSM matcher state by one token.
This method advances only the FSM state for constrained decoding.
It does NOT modify the token buffer. Use advance_token_buffer()
separately if token buffer advancement is needed, or use update()
for the common case of advancing both together.
-
Parameters:
-
token (int) β The token to consume in the FSM.
-
Returns:
-
True if the token was accepted by the matcher, False if no matcher is present.
-
Raises:
-
AssertionError β If the matcher rejects the token, indicating a mismatch between the bitmask and FSM state.
-
Return type:
advance_token_buffer()β
advance_token_buffer(new_token, log_probabilities=None)
Advance the token buffer without touching FSM state.
This method handles token buffer mutations including:
- Chunked prefill advancement
- Log probability storage
- Token buffer advancement
- EOS/max-length status updates
It does NOT advance the FSM matcher. Use advance_fsm() separately
if FSM advancement is needed, or use update() for the common case
of advancing both together.
-
Parameters:
-
- new_token (int) β The token to append to the buffer.
- log_probabilities (LogProbabilities | None) β Optional log probabilities for this token.
-
Return type:
-
None
apply_processing_offset()β
apply_processing_offset(offset)
Applies a processing offset to the token buffer.
-
Parameters:
-
offset (int)
-
Return type:
-
None
cached_prefix_lengthβ
Number of prompt tokens served from the KV prefix cache.
Set by the block manager when a request is admitted to a CE batch
(0 if no matching prefix). BatchMetrics.create
consumes the value to emit a per-request cache hit rate observation, and
uses _cache_metrics_emitted to guard against re-emitting on
chunked-prefill follow-up calls.
compute_num_available_steps()β
compute_num_available_steps(max_seq_len)
Computes the maximum number of steps without exceeding max_seq_len.
Takes the current context length into account.
eos_trackerβ
eos_tracker: EOSTracker
external_block_metadataβ
external_block_metadata: Any = None
Block metadata from the Orchestrator for distributed KV cache (dKV).
When set, the DKVConnector reads this during lookup() to determine which blocks are available in the external BlockStore system.
get_min_token_logit_mask()β
get_min_token_logit_mask(num_steps)
Returns per-step masks for logits that should be masked (e.g. EOS during min_tokens).
This is primarily used for the min_tokens setting, where we mask
EOS tokens in the logits to avoid generating them before we reach
min_tokens.
ignore_eosβ
ignore_eos: bool = False
in_reasoning_phaseβ
in_reasoning_phase: bool = False
Whether the latest committed tokens are inside a <think>...</think>
block. Toggled host-side after each commit when a reasoning parser is
configured.
is_doneβ
property is_done: bool
Whether text generation has finished.
is_initial_promptβ
property is_initial_prompt: bool
Returns true if the context has not been updated with tokens.
json_schemaβ
log_probabilitiesβ
log_probabilities: int = 0
log_probabilities_echoβ
log_probabilities_echo: bool = False
matcherβ
property matcher: LLMatcher | None
The optional grammar matcher for constrained decoding.
max_lengthβ
max_length: int
min_tokensβ
property min_tokens: int
The minimum number of new tokens to generate.
model_nameβ
model_name: str = ''
realize_future_token()β
realize_future_token(new_token, log_probabilities=None)
Overwrite the placeholder future token with the actual token.
This is primarily used for overlap scheduling.
-
Parameters:
-
- new_token (int)
- log_probabilities (LogProbabilities | None)
-
Return type:
-
None
request_idβ
request_id: RequestID
reset()β
reset()
Resets the contextβs state by combining all tokens into a new prompt.
-
Return type:
-
None
sampling_paramsβ
sampling_params: SamplingParams
set_matcher()β
set_matcher(matcher)
Sets the grammar matcher for constrained decoding.
-
Parameters:
-
matcher (LLMatcher)
-
Return type:
-
None
spec_decoding_stateβ
property spec_decoding_state: SpecDecodingState
Gets or creates the per-request speculative decoding state.
statusβ
status: GenerationStatus = 'active'
target_endpointβ
to_generation_output()β
to_generation_output()
Get completion tokens that are ready to be returned to the user.
This method retrieves tokens that have been generated but not yet delivered to the user, along with their associated log probability data.
-
Returns:
-
The completion tokens and their associated log probabilities, if available.
-
Return type:
tokensβ
tokens: TokenBuffer
update()β
update(new_token, log_probabilities=None)
Advance both token buffer and FSM state.
This is the standard single-step update that most callers should use.
It combines advance_token_buffer() and advance_fsm() for the
common case where both need to be advanced together.
For multi-step execution where FSM is advanced separately (e.g., to compute bitmasks between steps), use the individual methods directly.
-
Parameters:
-
- new_token (int) β The token to append and consume.
- log_probabilities (LogProbabilities | None) β Optional log probabilities for this token.
-
Return type:
-
None
update_with_future_token()β
update_with_future_token()
Append a placeholder future token to the generated tokens.
This is primarily used for overlap scheduling. For structured output contexts (those with a matcher), only the token buffer is advanced. The FSM will be advanced later when the future token is realized with the actual generated token.
-
Return type:
-
None
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!