Mojo function
copy_kv_pages_d2h
copy_kv_pages_d2h[dtype: DType](device_kv_blocks: LayoutTensor[dtype, Layout.row_major[6](), origin], host_kv_blocks: LayoutTensor[dtype, Layout.row_major[6](), origin], src_page_ids: LayoutTensor[DType.int64, Layout.row_major[1](), origin], dst_page_ids: LayoutTensor[DType.int64, Layout.row_major[1](), origin], layer_idx: Int, ctx: DeviceContext)
Copy selected pages for a single layer from device to host KV cache.
This function performs true GPU→CPU async copy using enqueue_copy. It copies only the specified layer for each page, with separate source and destination page IDs to support independent page ID spaces.
The 6D tensor layout is: [num_pages, kv_dim, num_layers, page_size, num_heads, head_dim]
Args:
- device_kv_blocks (
LayoutTensor): Source GPU KV cache blocks . - host_kv_blocks (
LayoutTensor): Destination CPU KV cache blocks. - src_page_ids (
LayoutTensor): Pointer to GPU page IDs. - dst_page_ids (
LayoutTensor): Pointer to CPU page IDs. - layer_idx (
Int): Which layer to copy. - ctx (
DeviceContext): Device context for GPU operations.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!