Mojo module
tile_loader
TMA tile loader for SM100 matrix multiplication.
Provides a wrapper around TMA async_multicast_load operations, following the SM90 TileLoaderTMA pattern. Orchestration logic (k-group iteration, expect_bytes, barrier management) is handled by the kernel, not the loader.
Usage: # In kernel - create separate A and B loaders var a_loader = ATileLoaderType(Pointer(to=a_tma_op), ctx.a_multicast_mask) var b_loader = BTileLoaderType(Pointer(to=b_tma_op), ctx.b_multicast_mask)
# Load tiles using the loaders
a_loader.load(a_tile, barrier, k_coord, m_coord)
b_loader.load(b_tile, barrier, k_coord, n_coord)Structs
-
TileLoaderTMA: TMA-based tile loader for SM100.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!