Python module
radix_trie
RadixTrie
class max.pipelines.kv_cache.radix_trie.RadixTrie(page_size: int = 1)
This RadixTrie is specially designed for prefix caching in paged attention.
The RadixTrie allows for efficient insertion and matching of sequences. It matches each prefix of tokens in a sequence to its corresponding blocks. Compared to a naive trie, the RadixTrie allows storing multiple tokens in a single node for less indirection and faster access.
Blocks in the RadixTrie should be immutable and committed. If it is in the RadixTrie, it is eligible for sharing. An inflight or uncommitted block that is being written to by a sequence should not be in the RadixTrie.
The RadixTrie allows for an LRU eviction policy for its leaves. We only allow evictions if no active sequences are using the node.
Currently, the RadixTrie assumes that the paged KVCache page size is 1.
This implementation is based off of SGLang: : - https://github.com/sgl-project/sglang/blob/337fe53ac41c68d6f171ef3b446f55eb0e98f77c/python/sglang/srt/mem_cache/radix_cache.py#L58
evict_blocks()
Attempt to evict at most desired_num_evicted blocks from trie.
insert()
insert(tokens: ndarray | List[Any], blocks: List[Any], node: TrieNode | None = None) → TrieNode
Inserts tokens and blocks into the trie.
We assume that each block contains exactly one token so the length of both input lists must match.
-
Parameters:
- tokens – Tokens to insert into trie
- blocks – KV cache block for each token
- node – Node to begin insertion at. If this is not a leaf node, blocks in the tree are overwritten.
-
Returns:
Node corresponding to end of the sequence where future : generated tokens can be inserted
-
Return type:
trie_node
mark_in_use_by()
Climb up the trie starting from node, marking each node as being in use by this seq.
mark_not_in_use_by()
Climb up the trie starting from node, marking each node as no longer in use by this seq. Since nodes without any users may be eligible for eviction, we also update its last_access_time.
match_prefix()
match_prefix(tokens: ndarray | List[Any], node: TrieNode | None = None) → Tuple[TrieNode, List[Any]]
Matches the input tokens with the contents of the trie.
-
Parameters:
- tokens – tokens to search the trie for
- node – Node to begin matching at.
-
Returns:
- trie_node: Node corresponding to end of matched prefix where : future generated tokens can be inserted. This is a leaf node.
- block_list: KV cache blocks for matched prefix
-
Return type:
Tuple containing
pretty_format()
Formats the contents of the trie.
pretty_print()
pretty_print(print_blocks: bool = True)
Prints the contents of the trie.
TrieNode
class max.pipelines.kv_cache.radix_trie.TrieNode
A TrieNode consists of a list of tokens and blocks.
- Tokens are the ids of the tokens in the sequence.
- Blocks are the offsets into the KVCache region that back the KV entries for a given token. I.e: the page index
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!