Python module

context

`InputContext`

class max.pipelines.context.InputContext(*args, **kwargs)

A base class for model contexts, represent model inputs for TokenGenerators.

completed: The tokens that have already been processed and encoded.
next_tokens: The tokens that will be processed in the next iteration. : This may be a subset of the full prompt due to chunked prefill.
preallocated: The token slots that have been preallocated. The token array : resizes to multiples of CHUNK_SIZE to accommodate the new tokens.

`active_idx`

property active_idx*: int*

`active_length`

property active_length*: int*

num tokens input this iteration.

This will be the prompt size for context encoding, and simply 1 for token generation.

Type:

Current sequence length

`assign_to_cache()`

assign_to_cache(cache_seq_id: int) → None

Assigns the context to a cache slot.

`bump_token_indices()`

bump_token_indices(start_idx: int = 0, active_idx: int = 0, end_idx: int = 0, committed_idx: int = 0) → None

Update the start_idx, active_idx and end_idx without manipulating the token array.

`cache_seq_id`

property cache_seq_id*: int*

Returns the cache slot assigned to the context, raising an error if not assigned.

`committed_idx`

property committed_idx*: int*

`compute_num_available_steps()`

compute_num_available_steps(max_seq_len: int) → int

Compute the max number of steps we can execute for a given context without exceeding the max_seq_len.

`current_length`

property current_length*: int*

The current length of the sequence, including completed and active tokens.

`end_idx`

property end_idx*: int*

`ignore_eos`

property ignore_eos*: bool*

`is_assigned_to_cache`

property is_assigned_to_cache*: bool*

Returns True if input is assigned to a cache slot, False otherwise.

`json_schema`

property json_schema*: str | None*

A json schema to use during constrained decoding.

`jump_ahead()`

jump_ahead(new_token: int) → None

Updates the token array, while ensuring the new token is returned to the user.

`log_probabilities`

property log_probabilities*: int*

When > 0, returns the log probabilities for the top N tokens for each element token in the sequence.

`log_probabilities_echo`

property log_probabilities_echo*: bool*

When True, the input tokens are added to the returned logprobs.

`matcher`

property matcher*: 'xgr.GrammarMatcher' | None*

An optional xgr Grammar Matcher provided when using structured output.

`max_length`

property max_length*: int | None*

The maximum length of this sequence.

`next_tokens`

property next_tokens*: ndarray*

The next prompt tokens to be input during this iteration.

This should be a 1D array of tokens of length active_length.

`outstanding_completion_tokens()`

outstanding_completion_tokens() → list[tuple[int, Optional[max.pipelines.interfaces.response.LogProbabilities]]]

Return the list of outstanding completion tokens and log probabilities that must be returned to the user.

`reset()`

reset() → None

Resets the context’s state by combining all tokens into a new prompt. This method is used when a request is evicted, meaning that the context needed to be re-encoded in the following CE iteration.

`set_matcher()`

set_matcher(matcher: xgr.GrammarMatcher) → None

Set a grammar matcher for use during constrained decoding.

`set_token_indices()`

set_token_indices(start_idx: int | None = None, active_idx: int | None = None, end_idx: int | None = None, committed_idx: int | None = None) → None

Set the token indices without manipulating the token array.

`start_idx`

property start_idx*: int*

`tokens`

property tokens*: ndarray*

All tokens in the context.

`unassign_from_cache()`

unassign_from_cache() → None

Unassigns the context from a cache slot.

`update()`

update(new_token: int, log_probabilities: LogProbabilities | None = None, is_eos: bool = False) → None

Updates the next_tokens and extends existing tokens to include all generated tokens.

`TextAndVisionContext`

class max.pipelines.context.TextAndVisionContext(cache_seq_id: int, prompt: str | Sequence[int], max_length: int | None, tokens: ndarray, pixel_values: Sequence[ndarray], extra_model_args: dict[str, Any], log_probabilities: int = 0, log_probabilities_echo: bool = False, json_schema: str | None = None, ignore_eos: bool = False)

A base class for model context, specifically for Vision model variants.

`update()`

update(new_token: int, log_probabilities: LogProbabilities | None = None, is_eos: bool = False) → None

Updates the next_tokens and extends existing tokens to include all generated tokens.

`TextContext`

class max.pipelines.context.TextContext(prompt: str | Sequence[int], max_length: int | None, tokens: ndarray, cache_seq_id: int | None = None, log_probabilities: int = 0, log_probabilities_echo: bool = False, json_schema: str | None = None, ignore_eos: bool = False)

A base class for model context, specifically for Text model variants.

`active_idx`

property active_idx*: int*

`active_length`

property active_length*: int*

num tokens input this iteration.

This will be the prompt size for context encoding, and simply 1 (or more) for token generation.

Type:

Current sequence length

`assign_to_cache()`

assign_to_cache(cache_seq_id: int) → None

`bump_token_indices()`

bump_token_indices(start_idx: int = 0, active_idx: int = 0, end_idx: int = 0, committed_idx: int = 0) → None

Update the start_idx, active_idx and end_idx without manipulating the token array.

`cache_seq_id`

property cache_seq_id*: int*

`committed_idx`

property committed_idx*: int*

`compute_num_available_steps()`

compute_num_available_steps(max_seq_len: int) → int

Compute the max number of steps we can execute for a given context without exceeding the max_seq_len.

`current_length`

property current_length*: int*

The current length of the sequence, including completed and active tokens.

`end_idx`

property end_idx*: int*

`is_assigned_to_cache`

property is_assigned_to_cache*: bool*

`jump_ahead()`

jump_ahead(new_token: int) → None

Updates the token array, while ensuring the new token is returned to the user.

`next_tokens`

property next_tokens*: ndarray*

`outstanding_completion_tokens()`

outstanding_completion_tokens() → list[tuple[int, Optional[max.pipelines.interfaces.response.LogProbabilities]]]

Return the list of outstanding completion tokens and log probabilities that must be returned to the user.

`reset()`

reset() → None

Resets the context’s state by combining all tokens into a new prompt.

`set_matcher()`

set_matcher(matcher: xgr.GrammarMatcher) → None

`set_token_indices()`

set_token_indices(start_idx: int | None = None, active_idx: int | None = None, end_idx: int | None = None, committed_idx: int | None = None) → None

Set the token indices without manipulating the token array.

`start_idx`

property start_idx*: int*

`tokens`

property tokens*: ndarray*

`unassign_from_cache()`

unassign_from_cache() → None

`update()`

update(new_token: int, log_probabilities: LogProbabilities | None = None, is_eos: bool = False) → None

Updates the next_tokens and extends existing tokens to include all generated tokens.

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!

InputContext​

active_idx​

active_length​

assign_to_cache()​

bump_token_indices()​

cache_seq_id​

committed_idx​

compute_num_available_steps()​

current_length​

end_idx​

ignore_eos​

is_assigned_to_cache​

json_schema​

jump_ahead()​

log_probabilities​

log_probabilities_echo​

matcher​

max_length​

next_tokens​

outstanding_completion_tokens()​

reset()​

set_matcher()​

set_token_indices()​

start_idx​

tokens​

unassign_from_cache()​

update()​

TextAndVisionContext​

update()​

TextContext​

active_idx​

active_length​

assign_to_cache()​

bump_token_indices()​

cache_seq_id​

committed_idx​

compute_num_available_steps()​

current_length​

end_idx​

is_assigned_to_cache​

jump_ahead()​

next_tokens​

outstanding_completion_tokens()​

reset()​

set_matcher()​

set_token_indices()​

start_idx​

tokens​

unassign_from_cache()​

update()​

`InputContext`

`active_idx`

`active_length`

`assign_to_cache()`

`bump_token_indices()`

`cache_seq_id`

`committed_idx`

`compute_num_available_steps()`

`current_length`

`end_idx`

`ignore_eos`

`is_assigned_to_cache`

`json_schema`

`jump_ahead()`

`log_probabilities`

`log_probabilities_echo`

`matcher`

`max_length`

`next_tokens`

`outstanding_completion_tokens()`

`reset()`

`set_matcher()`

`set_token_indices()`

`start_idx`

`tokens`

`unassign_from_cache()`

`update()`

`TextAndVisionContext`

`update()`

`TextContext`

`active_idx`

`active_length`

`assign_to_cache()`

`bump_token_indices()`

`cache_seq_id`

`committed_idx`

`compute_num_available_steps()`

`current_length`

`end_idx`

`is_assigned_to_cache`

`jump_ahead()`

`next_tokens`

`outstanding_completion_tokens()`

`reset()`

`set_matcher()`

`set_token_indices()`

`start_idx`

`tokens`

`unassign_from_cache()`

`update()`