Skip to main content

Mojo module

tile_loader

TileLoader for SM100 matrix multiplication.

Provides tile loading abstractions for efficient global-to-shared memory transfers using TMA with support for:

  • K-group batching (multiple tiles per barrier synchronization)
  • CTA group coordination (1-SM or 2-SM cooperative loading)
  • Multicast for cluster distribution

Usage: var loader = TileLoaderTMA[...](a_tma_op, b_tma_op, masks, peer_coord) loader.set_work_tile(m_coord, n_coord)

with producer.get_tiles() as tiles:
    loader.load_tiles(tiles, k_iter, elect_one_cta)

Structs

Was this page helpful?