Skip to main content

Mojo module

lmcache_transfer

KV Cache transfer kernels for external cache integration (e.g., LMCache).

This module provides GPU kernels to efficiently transfer KV cache data between MAX's paged KV cache format and external contiguous formats like LMCache's KV_2LTD.

MAX PagedKVCacheCollection Layout: [total_num_blocks, kv_dim, num_layers, page_size, num_heads, head_size] where kv_dim = 2 (K and V) for standard attention, 1 for MLA

External Contiguous Layout (KV_2LTD): [kv_dim, num_layers, num_tokens, hidden_dim] where hidden_dim = num_heads * head_size

The kernels use slot_mapping to transfer data between the formats. slot_mapping[token_idx] gives the physical slot in the paged cache: block_id = slot // page_size offset_in_block = slot % page_size

Functionsโ€‹

Was this page helpful?