Python module
context
InputContext
class max.pipelines.context.InputContext(*args, **kwargs)
A base class for model contexts, represent model inputs for TokenGenerators.
Token array layout: . +———- full prompt ———-+ CHUNK_SIZE*N v . +——————–+—————+—————–+—————-+ . | completed | next_tokens | | preallocated | . +——————–+—————+—————–+—————-+ . start_idx ^ active_idx ^ end_idx ^
- completed: The tokens that have already been processed and encoded.
- next_tokens: The tokens that will be processed in the next iteration. : This may be a subset of the full prompt due to chunked prefill.
- preallocated: The token slots that have been preallocated. The token array : resizes to multiples of CHUNK_SIZE to accommodate the new tokens.
active_idx
property active_idx*: int*
active_length
property active_length*: int*
num tokens input this iteration.
This will be the prompt size for context encoding, and simply 1 for token generation.
-
Type:
Current sequence length
assign_to_cache()
Assigns the context to a cache slot.
bump_token_indices()
bump_token_indices(start_idx: int = 0, active_idx: int = 0, end_idx: int = 0, committed_idx: int = 0) → None
Update the start_idx, active_idx and end_idx without manipulating the token array.
cache_seq_id
property cache_seq_id*: int*
Returns the cache slot assigned to the context, raising an error if not assigned.
committed_idx
property committed_idx*: int*
compute_num_available_steps()
Compute the max number of steps we can execute for a given context without exceeding the max_seq_len.
current_length
property current_length*: int*
The current length of the sequence, including completed and active tokens.
end_idx
property end_idx*: int*
ignore_eos
property ignore_eos*: bool*
is_assigned_to_cache
property is_assigned_to_cache*: bool*
Returns True if input is assigned to a cache slot, False otherwise.
json_schema
A json schema to use during constrained decoding.
jump_ahead()
Updates the token array, while ensuring the new token is returned to the user.
log_probabilities
property log_probabilities*: int*
When > 0, returns the log probabilities for the top N tokens for each element token in the sequence.
log_probabilities_echo
property log_probabilities_echo*: bool*
When True, the input tokens are added to the returned logprobs.
matcher
property matcher*: 'xgr.GrammarMatcher' | None*
An optional xgr Grammar Matcher provided when using structured output.
max_length
The maximum length of this sequence.
next_tokens
property next_tokens*: ndarray*
The next prompt tokens to be input during this iteration.
This should be a 1D array of tokens of length active_length.
outstanding_completion_tokens()
outstanding_completion_tokens() → list[tuple[int, Optional[max.pipelines.interfaces.response.LogProbabilities]]]
Return the list of outstanding completion tokens and log probabilities that must be returned to the user.
reset()
reset() → None
Resets the context’s state by combining all tokens into a new prompt. This method is used when a request is evicted, meaning that the context needed to be re-encoded in the following CE iteration.
set_matcher()
set_matcher(matcher: xgr.GrammarMatcher) → None
Set a grammar matcher for use during constrained decoding.
set_token_indices()
set_token_indices(start_idx: int | None = None, active_idx: int | None = None, end_idx: int | None = None, committed_idx: int | None = None) → None
Set the token indices without manipulating the token array.
start_idx
property start_idx*: int*
tokens
property tokens*: ndarray*
All tokens in the context.
unassign_from_cache()
unassign_from_cache() → None
Unassigns the context from a cache slot.
update()
update(new_token: int, log_probabilities: LogProbabilities | None = None, is_eos: bool = False) → None
Updates the next_tokens and extends existing tokens to include all generated tokens.
TextAndVisionContext
class max.pipelines.context.TextAndVisionContext(cache_seq_id: int, prompt: str | Sequence[int], max_length: int | None, tokens: ndarray, pixel_values: Sequence[ndarray], extra_model_args: dict[str, Any], log_probabilities: int = 0, log_probabilities_echo: bool = False, json_schema: str | None = None, ignore_eos: bool = False)
A base class for model context, specifically for Vision model variants.
update()
update(new_token: int, log_probabilities: LogProbabilities | None = None, is_eos: bool = False) → None
Updates the next_tokens and extends existing tokens to include all generated tokens.
TextContext
class max.pipelines.context.TextContext(prompt: str | Sequence[int], max_length: int | None, tokens: ndarray, cache_seq_id: int | None = None, log_probabilities: int = 0, log_probabilities_echo: bool = False, json_schema: str | None = None, ignore_eos: bool = False)
A base class for model context, specifically for Text model variants.
active_idx
property active_idx*: int*
active_length
property active_length*: int*
num tokens input this iteration.
This will be the prompt size for context encoding, and simply 1 (or more) for token generation.
-
Type:
Current sequence length
assign_to_cache()
bump_token_indices()
bump_token_indices(start_idx: int = 0, active_idx: int = 0, end_idx: int = 0, committed_idx: int = 0) → None
Update the start_idx, active_idx and end_idx without manipulating the token array.
cache_seq_id
property cache_seq_id*: int*
committed_idx
property committed_idx*: int*
compute_num_available_steps()
Compute the max number of steps we can execute for a given context without exceeding the max_seq_len.
current_length
property current_length*: int*
The current length of the sequence, including completed and active tokens.
end_idx
property end_idx*: int*
is_assigned_to_cache
property is_assigned_to_cache*: bool*
jump_ahead()
Updates the token array, while ensuring the new token is returned to the user.
next_tokens
property next_tokens*: ndarray*
outstanding_completion_tokens()
outstanding_completion_tokens() → list[tuple[int, Optional[max.pipelines.interfaces.response.LogProbabilities]]]
Return the list of outstanding completion tokens and log probabilities that must be returned to the user.
reset()
reset() → None
Resets the context’s state by combining all tokens into a new prompt.
set_matcher()
set_matcher(matcher: xgr.GrammarMatcher) → None
set_token_indices()
set_token_indices(start_idx: int | None = None, active_idx: int | None = None, end_idx: int | None = None, committed_idx: int | None = None) → None
Set the token indices without manipulating the token array.
start_idx
property start_idx*: int*
tokens
property tokens*: ndarray*
unassign_from_cache()
unassign_from_cache() → None
update()
update(new_token: int, log_probabilities: LogProbabilities | None = None, is_eos: bool = False) → None
Updates the next_tokens and extends existing tokens to include all generated tokens.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!