Python module
context
KVCacheAwareContext
class max.nn.kv_cache.context.KVCacheAwareContext(*args, **kwargs)
A Protocol identifying the minimum API necessary for interacting with a KV Cache.
active_idx
property active_idx: int
active_length
property active_length: int
num tokens input this iteration.
This will be the prompt size for context encoding, and simply 1 for token generation.
-
Type:
-
Current sequence length
assign_to_cache()
assign_to_cache(cache_seq_id)
Assigns the context to a cache slot.
-
Parameters:
-
cache_seq_id (int)
-
Return type:
-
None
bump_token_indices()
bump_token_indices(start_idx=0, active_idx=0, end_idx=0, committed_idx=0)
Update the start_idx, active_idx and end_idx without manipulating the token array.
cache_seq_id
property cache_seq_id: int
Returns the cache slot assigned to the context, raising an error if not assigned.
committed_idx
property committed_idx: int
compute_num_available_steps()
compute_num_available_steps(max_seq_len)
Compute the max number of steps we can execute for a given context without exceeding the max_seq_len.
current_length
property current_length: int
The current length of the sequence, including completed and active tokens.
end_idx
property end_idx: int
eos_token_ids
is_assigned_to_cache
property is_assigned_to_cache: bool
Returns True if input is assigned to a cache slot, False otherwise.
is_done
property is_done: bool
json_schema
A json schema to use during constrained decoding.
matcher
property matcher: xgr.GrammarMatcher | None
An optional xgr Grammar Matcher provided when using structured output.
max_length
The maximum length of this sequence.
next_tokens
property next_tokens: ndarray
The next prompt tokens to be input during this iteration.
This should be a 1D array of tokens of length active_length.
reset()
reset()
Resets the context’s state by combining all tokens into a new prompt. This method is used when a request is evicted, meaning that the context needed to be re-encoded in the following CE iteration.
-
Return type:
-
None
set_matcher()
set_matcher(matcher)
Set a grammar matcher for use during constrained decoding.
-
Parameters:
-
matcher (xgr.GrammarMatcher)
-
Return type:
-
None
set_token_indices()
set_token_indices(start_idx=None, active_idx=None, end_idx=None, committed_idx=None)
Set the token indices without manipulating the token array.
start_idx
property start_idx: int
status
property status: GenerationStatus
tokens
property tokens: ndarray
All tokens in the context.
unassign_from_cache()
unassign_from_cache()
Unassigns the context from a cache slot.
-
Return type:
-
None
update_status()
update_status(status)
-
Parameters:
-
status (GenerationStatus)
-
Return type:
-
None
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!