Skip to main content

Mojo function

copy_kv_pages_d2h

copy_kv_pages_d2h[dtype: DType](device_kv_blocks: LayoutTensor[dtype, Layout.row_major[6]()], host_kv_blocks: LayoutTensor[dtype, Layout.row_major[6]()], src_page_ids: LayoutTensor[DType.int64, Layout.row_major[1]()], dst_page_ids: LayoutTensor[DType.int64, Layout.row_major[1]()], layer_idx: Int, ctx: DeviceContext)

Copy selected pages for a single layer from device to host KV cache.

This function performs true GPU→CPU async copy using enqueue_copy. It copies only the specified layer for each page, with separate source and destination page IDs to support independent page ID spaces.

The 6D tensor layout is: [num_pages, kv_dim, num_layers, page_size, num_heads, head_dim]

Args: