Skip to main content

Mojo function

copy_kv_pages_d2h

copy_kv_pages_d2h[dtype: DType](device_kv_blocks: LayoutTensor[dtype, Layout.row_major[6](), origin], host_kv_blocks: LayoutTensor[dtype, Layout.row_major[6](), origin], src_page_ids: LayoutTensor[DType.int64, Layout.row_major[1](), origin], dst_page_ids: LayoutTensor[DType.int64, Layout.row_major[1](), origin], layer_idx: Int, ctx: DeviceContext)

Copy selected pages for a single layer from device to host KV cache.

This function performs true GPU→CPU async copy using enqueue_copy. It copies only the specified layer for each page, with separate source and destination page IDs to support independent page ID spaces.

The 6D tensor layout is: [num_pages, kv_dim, num_layers, page_size, num_heads, head_dim]

Args:

  • device_kv_blocks (LayoutTensor): Source GPU KV cache blocks .
  • host_kv_blocks (LayoutTensor): Destination CPU KV cache blocks.
  • src_page_ids (LayoutTensor): Pointer to GPU page IDs.
  • dst_page_ids (LayoutTensor): Pointer to CPU page IDs.
  • layer_idx (Int): Which layer to copy.
  • ctx (DeviceContext): Device context for GPU operations.

Was this page helpful?