Mojo module
lmcache_transfer
KV Cache transfer kernels for external cache integration (e.g., LMCache).
This module provides GPU kernels to efficiently transfer KV cache data between MAX's paged KV cache format and external contiguous formats like LMCache's KV_2LTD.
MAX PagedKVCacheCollection Layout: [total_num_blocks, kv_dim, num_layers, page_size, num_heads, head_size] where kv_dim = 2 (K and V) for standard attention, 1 for MLA
External Contiguous Layout (KV_2LTD): [kv_dim, num_layers, num_tokens, hidden_dim] where hidden_dim = num_heads * head_size
The kernels use slot_mapping to transfer data between the formats. slot_mapping[token_idx] gives the physical slot in the paged cache: block_id = slot // page_size offset_in_block = slot % page_size
Functionsโ
- โ
lmcache_offload: Offload KV cache data from MAX paged format to external contiguous format. - โ
lmcache_onload: Onload KV cache data from external contiguous format to MAX paged format.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!