Mojo module

lmcache_transfer

KV Cache transfer kernels for external cache integration (e.g., LMCache).

This module provides GPU kernels to efficiently transfer KV cache data between MAX's paged KV cache format and external contiguous formats like LMCache's KV_2LTD.

MAX PagedKVCacheCollection Layout: [total_num_blocks, kv_dim, num_layers, page_size, num_heads, head_size] where kv_dim = 2 (K and V) for standard attention, 1 for MLA

External Contiguous Layout (KV_2LTD): [kv_dim, num_layers, num_tokens, hidden_dim] where hidden_dim = num_heads * head_size

The kernels use slot_mapping to transfer data between the formats. slot_mapping[token_idx] gives the physical slot in the paged cache: block_id = slot // page_size offset_in_block = slot % page_size

Functions

lmcache_offload: Offload KV cache data from MAX paged format to external contiguous format.
lmcache_onload: Onload KV cache data from external contiguous format to MAX paged format.

Functions​

Functions