Mojo struct

AmdWarpBlockScatterGather

struct AmdWarpBlockScatterGather[SmemType: DType, thread_layout: Layout, warp_tile_layout: Layout, simd_width: Int, is_a: Bool, warp_rows: Int, warp_cols: Int, swizzle: OptionalReg[Swizzle] = None]

Transports data from global -> register -> shared memory. Does this by warp tile each warp is responsible for moving one warp block of smem.

Fields

fragment (LayoutTensor[SmemType, Layout.row_major(((warp_tile_layout.size() // thread_layout.size()) // simd_width), simd_width), MutableAnyOrigin, address_space=AddressSpace(5)]):

Implemented traits

AnyType, UnknownDestructibility

Aliases

`delis_trivial`

alias __del__is_trivial = True

`elements_loaded_per_thread`

alias elements_loaded_per_thread = (warp_tile_layout.size() // thread_layout.size())

`LoadFragmentType`

alias LoadFragmentType = LayoutTensor[SmemType, Layout.row_major(((warp_tile_layout.size() // thread_layout.size()) // simd_width), simd_width), MutableAnyOrigin, address_space=AddressSpace(5)]

`simd_loads_per_thread`

alias simd_loads_per_thread = ((warp_tile_layout.size() // thread_layout.size()) // simd_width)

`total_participating_threads`

alias total_participating_threads = thread_layout.size()

Methods

`init`

__init__(out self)

`load_compute_tile`

load_compute_tile[GmemType: DType, GmemLayout: Layout](mut self, mut cache_manager: RingBuffer[SmemBufferTypeA, SmemBufferTypeB, consumer_warps], mut phase: Int, gmem_tile: LayoutTensor[GmemType, GmemLayout, MutableAnyOrigin, address_space=AddressSpace(1)], stage: Int, tile_idx: Int)

Fields​

Implemented traits​

Aliases​

__del__is_trivial​

elements_loaded_per_thread​

LoadFragmentType​

simd_loads_per_thread​

total_participating_threads​

Methods​

__init__​

load_compute_tile​