For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

copy_kv_pages_d2h

def copy_kv_pages_d2h[dtype: DType](device_kv_blocks: LayoutTensor[dtype, Layout.row_major[Int(6)]()], host_kv_blocks: LayoutTensor[dtype, Layout.row_major[Int(6)]()], src_page_ids: LayoutTensor[DType.int64, Layout.row_major[Int(1)]()], dst_page_ids: LayoutTensor[DType.int64, Layout.row_major[Int(1)]()], layer_idx: Int, ctx: DeviceContext)

Copy selected pages for a single layer from device to host KV cache.

This function performs true GPU→CPU async copy using enqueue_copy. It copies only the specified layer for each page, with separate source and destination page IDs to support independent page ID spaces.

The 6D tensor layout is: [num_pages, kv_dim, num_layers, page_size, num_heads, head_dim]

Args:

device_kv_blocks (LayoutTensor[dtype, Layout.row_major[Int(6)]()]): Source GPU KV cache blocks .
host_kv_blocks (LayoutTensor[dtype, Layout.row_major[Int(6)]()]): Destination CPU KV cache blocks.
src_page_ids (LayoutTensor[DType.int64, Layout.row_major[Int(1)]()]): Pointer to GPU page IDs.
dst_page_ids (LayoutTensor[DType.int64, Layout.row_major[Int(1)]()]): Pointer to CPU page IDs.
layer_idx (Int): Which layer to copy.
ctx (DeviceContext): Device context for GPU operations.