Python module

context

`KVCacheAwareContext`

class max.nn.kv_cache.context.KVCacheAwareContext(*args, **kwargs)

A Protocol identifying the minimum API necessary for interacting with a KV Cache.

`active_idx`

property active_idx: int

`active_length`

property active_length: int

num tokens input this iteration.

This will be the prompt size for context encoding, and simply 1 for token generation.

Type:: Current sequence length

`bump_token_indices()`

bump_token_indices(start_idx=0, active_idx=0, end_idx=0, committed_idx=0)

Update the start_idx, active_idx and end_idx without manipulating the token array.

Parameters:

start_idx (int)
active_idx (int)
end_idx (int)
committed_idx (int)

Return type:

None

`committed_idx`

property committed_idx: int

`compute_num_available_steps()`

compute_num_available_steps(max_seq_len)

Compute the max number of steps we can execute for a given context without exceeding the max_seq_len.

Parameters:: max_seq_len (int)
Return type:: int

`current_length`

property current_length: int

The current length of the sequence, including completed and active tokens.

`end_idx`

property end_idx: int

`eos_token_ids`

property eos_token_ids: set[int]

`is_done`

property is_done: bool

`json_schema`

property json_schema: str | None

A json schema to use during constrained decoding.

`matcher`

property matcher: llguidance.LLMatcher | None

An optional Grammar Matcher provided when using structured output.

`max_length`

property max_length: int | None

The maximum length of this sequence.

`next_tokens`

property next_tokens: ndarray

The next prompt tokens to be input during this iteration.

This should be a 1D array of tokens of length active_length.

`request_id`

property request_id: str

`reset()`

reset()

Resets the context’s state by combining all tokens into a new prompt. This method is used when a request is evicted, meaning that the context needed to be re-encoded in the following CE iteration.

Return type:: None

`set_matcher()`

set_matcher(matcher)

Set a grammar matcher for use during constrained decoding.

Parameters:: matcher (llguidance.LLMatcher)
Return type:: None

`set_token_indices()`

set_token_indices(start_idx=None, active_idx=None, end_idx=None, committed_idx=None)

Set the token indices without manipulating the token array.

Parameters:

start_idx (int | None)
active_idx (int | None)
end_idx (int | None)
committed_idx (int | None)

Return type:

None

`start_idx`

property start_idx: int

`status`

property status: GenerationStatus

`tokens`

property tokens: ndarray

All tokens in the context.

`update_status()`

update_status(status)

Parameters:: status (GenerationStatus)
Return type:: None

KVCacheAwareContext​

active_idx​

active_length​

bump_token_indices()​

committed_idx​

compute_num_available_steps()​

current_length​

end_idx​

eos_token_ids​

is_done​

json_schema​

matcher​

max_length​

next_tokens​

request_id​

reset()​

set_matcher()​

set_token_indices()​

start_idx​

status​

tokens​

update_status()​