IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

copy_kv_pages_d2h

def copy_kv_pages_d2h[dtype: DType](device_kv_blocks: LayoutTensor[dtype, Layout.row_major[Int(6)]()], host_kv_blocks: LayoutTensor[dtype, Layout.row_major[Int(6)]()], src_page_ids: LayoutTensor[DType.int64, Layout.row_major[Int(1)]()], dst_page_ids: LayoutTensor[DType.int64, Layout.row_major[Int(1)]()], layer_idx: Int, ctx: DeviceContext)

Copy selected pages for a single layer from device to host KV cache.

This function performs true GPU→CPU async copy using enqueue_copy. It copies only the specified layer for each page, with separate source and destination page IDs to support independent page ID spaces.

The 6D tensor layout is: [num_pages, kv_dim, num_layers, page_size, num_heads, head_dim]

Args: