Mojo function
cp_async_bulk_shared_cta_global
cp_async_bulk_shared_cta_global[dst_type: AnyType, src_type: AnyType, mbr_type: AnyType, /, *, eviction_policy: CacheEviction = CacheEviction.EVICT_NORMAL](dst_mem: UnsafePointer[dst_type, dst_mem.origin, address_space=AddressSpace.SHARED], src_mem: UnsafePointer[src_type, src_mem.origin, address_space=src_mem.address_space], size: Int32, mem_bar: UnsafePointer[mbr_type, mem_bar.origin, address_space=AddressSpace.SHARED])
Initiates an asynchronous bulk copy from global memory to shared CTA memory.
Performs a non-blocking copy of size bytes from global memory to shared
memory using the cp.async.bulk PTX instruction. Completion is signaled
via the mbarrier specified by mem_bar.
Both dst_mem and src_mem must be 16-byte aligned, and size must be a
multiple of 16. Requires sm_90 or higher.
Parameters:
- dst_type (
AnyType): The element type of the destination shared memory. - src_type (
AnyType): The element type of the source global memory. - mbr_type (
AnyType): The element type of the mbarrier object in shared memory. - eviction_policy (
CacheEviction): Cache eviction policy for the L2 cache. Defaults toEVICT_NORMAL.
Args:
- dst_mem (
UnsafePointer): Destination pointer in shared CTA memory (16-byte aligned). - src_mem (
UnsafePointer): Source pointer in global or generic memory (16-byte aligned). - size (
Int32): Number of bytes to copy (must be a multiple of 16). - mem_bar (
UnsafePointer): Pointer to the mbarrier object in shared memory used to signal completion.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!