Skip to main content
Log in

Mojo function

cp_async_bulk_tensor_reduce

cp_async_bulk_tensor_reduce[src_type: AnyType, rank: Int, /, *, reduction_kind: StringLiteral, eviction_policy: CacheEviction = 0](src_mem: UnsafePointer[src_type, address_space=3], tma_descriptor: UnsafePointer[NoneType], coords: IndexList[rank])

These instructions initiate an asynchronous reduction operation of tensor data in global memory with the tensor data in shared{::cta} memory, using tile mode.

Args:

  • src_mem (UnsafePointer[src_type, address_space=3]): Pointer to source shared memory.
  • tma_descriptor (UnsafePointer[NoneType]): Pointer to tensor map descriptor.
  • coords (IndexList[rank]): Tile coordinates.