Mojo module
tile_loader
TileLoader for SM100 matrix multiplication.
Provides tile loading abstractions for efficient global-to-shared memory transfers using TMA with support for:
- K-group batching (multiple tiles per barrier synchronization)
- CTA group coordination (1-SM or 2-SM cooperative loading)
- Multicast for cluster distribution
Usage: var loader = TileLoaderTMA[...](a_tma_op, b_tma_op, masks, peer_coord) loader.set_work_tile(m_coord, n_coord)
with producer.get_tiles() as tiles:
loader.load_tiles(tiles, k_iter, elect_one_cta)Structs
-
TileLoaderTMA: TMA-based tile loader for SM100.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!