Skip to main content

Mojo function

cp_async_bulk_global_shared_cta

cp_async_bulk_global_shared_cta[dst_type: AnyType, src_type: AnyType, /, *, eviction_policy: CacheEviction = CacheEviction.EVICT_NORMAL](dst_mem: UnsafePointer[dst_type, dst_mem.origin, address_space=dst_mem.address_space], src_mem: UnsafePointer[src_type, src_mem.origin, address_space=AddressSpace.SHARED], size: Int32)

Initiates an asynchronous bulk copy from shared CTA memory to global memory.

Performs a non-blocking copy of size bytes from shared memory to global memory using the cp.async.bulk PTX instruction with the .bulk_group completion mechanism. Use cp_async_bulk_commit_group and cp_async_bulk_wait_group from std.gpu.sync to synchronize.

Both dst_mem and src_mem must be 16-byte aligned, and size must be a multiple of 16. Requires sm_90 or higher.

Parameters:

  • dst_type (AnyType): The element type of the destination global memory.
  • src_type (AnyType): The element type of the source shared memory.
  • eviction_policy (CacheEviction): Cache eviction policy for the L2 cache. Defaults to EVICT_NORMAL.

Args:

  • dst_mem (UnsafePointer): Destination pointer in global or generic memory (16-byte aligned).
  • src_mem (UnsafePointer): Source pointer in shared CTA memory (16-byte aligned).
  • size (Int32): Number of bytes to copy (must be a multiple of 16).

Was this page helpful?