Mojo struct
ScatterGatherCPAsync
@register_passable(trivial)
struct ScatterGatherCPAsync[dtype: DType, src_layout: Layout, thread_layout: Layout, swizzle_mode: TensorMapSwizzle, vector_size: Int]
Software-based tile loader using cp.async instructions.
This loader uses CUDA's cp.async instructions for asynchronous memory transfers with manual bounds checking and shared memory swizzling for optimal bank conflict avoidance.
Parameters
- dtype (
DType
): Data type of the elements being loaded. - src_layout (
Layout
): Layout of the source matrix in global memory. - thread_layout (
Layout
): Thread arrangement for distributed copying. - swizzle_mode (
TensorMapSwizzle
): Swizzling pattern for shared memory access. - vector_size (
Int
): Number of elements loaded per thread.
Fields
- src (
LayoutTensor[dtype, src_layout, MutableAnyOrigin]
):
Implemented traits
AnyType
,
Copyable
,
ImplicitlyCopyable
,
Movable
,
ScatterGather
,
UnknownDestructibility
Aliases
__copyinit__is_trivial
alias __copyinit__is_trivial = True
__del__is_trivial
alias __del__is_trivial = True
__moveinit__is_trivial
alias __moveinit__is_trivial = True
Methods
__init__
__init__(src: LayoutTensor[dtype, src_layout, MutableAnyOrigin]) -> Self
Initialize the cp.async tile loader.
Args:
- src (
LayoutTensor
): Source tensor in global memory.
load_tile
load_tile(self, dst: LayoutTensor[dtype, layout, MutableAnyOrigin, address_space=AddressSpace(3), alignment=128], mem_barrier: UnsafePointer[SharedMemBarrier, address_space=AddressSpace(3)], coords: Tuple[UInt, UInt])
Load a tile using cp.async instructions.
Extracts a tile from the source tensor and performs an asynchronous copy to shared memory with bounds checking and swizzling.
Note: Unlike TMA, this method expects tile indices and handles the conversion to element offsets internally via the tile() method.
Args:
- dst (
LayoutTensor
): Destination tile in shared memory. - mem_barrier (
UnsafePointer
): Memory barrier for synchronization (currently unused). - coords (
Tuple
): Tile indices (row_tile, col_tile) in the source matrix.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!