Skip to main content

Mojo struct

ScatterGatherCPAsync

@register_passable(trivial) struct ScatterGatherCPAsync[dtype: DType, src_layout: Layout, thread_layout: Layout, swizzle_mode: TensorMapSwizzle, vector_size: Int]

Software-based tile loader using cp.async instructions.

This loader uses CUDA's cp.async instructions for asynchronous memory transfers with manual bounds checking and shared memory swizzling for optimal bank conflict avoidance.

Parameters

  • dtype (DType): Data type of the elements being loaded.
  • src_layout (Layout): Layout of the source matrix in global memory.
  • thread_layout (Layout): Thread arrangement for distributed copying.
  • swizzle_mode (TensorMapSwizzle): Swizzling pattern for shared memory access.
  • vector_size (Int): Number of elements loaded per thread.

Fields

  • src (LayoutTensor[dtype, src_layout, MutableAnyOrigin]):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, Movable, ScatterGather, UnknownDestructibility

Aliases

__copyinit__is_trivial

alias __copyinit__is_trivial = True

__del__is_trivial

alias __del__is_trivial = True

__moveinit__is_trivial

alias __moveinit__is_trivial = True

Methods

__init__

__init__(src: LayoutTensor[dtype, src_layout, MutableAnyOrigin]) -> Self

Initialize the cp.async tile loader.

Args:

load_tile

load_tile(self, dst: LayoutTensor[dtype, layout, MutableAnyOrigin, address_space=AddressSpace(3), alignment=128], mem_barrier: UnsafePointer[SharedMemBarrier, address_space=AddressSpace(3)], coords: Tuple[UInt, UInt])

Load a tile using cp.async instructions.

Extracts a tile from the source tensor and performs an asynchronous copy to shared memory with bounds checking and swizzling.

Note: Unlike TMA, this method expects tile indices and handles the conversion to element offsets internally via the tile() method.

Args:

  • dst (LayoutTensor): Destination tile in shared memory.
  • mem_barrier (UnsafePointer): Memory barrier for synchronization (currently unused).
  • coords (Tuple): Tile indices (row_tile, col_tile) in the source matrix.

Was this page helpful?