Mojo module
matmul_mma
MMA and data-movement helpers for AMD matmul kernels.
Structs: TiledMma: Stateless MMA computation on TileTensors (mirrors TiledTensorCore.mma). Pure computation, no register ownership. MmaOp: Register ownership + SMEM loading + schedule API. Wraps TiledMma for per-k-tile load_frag/mma dispatch. QuadrantMmaOp: Owns A/B/C register tiles in LOCAL, provides quadrant load/compute methods for ping-pong double-buffering schedule. TileLoaderLDS: Cooperative global→LDS loader via buffer_load_to_lds.
Structs
-
MmaOp: Register ownership + SMEM loading + schedule API for AMD matmul. -
QuadrantMmaOp: MMA operator for AMD matmul ping-pong schedule. -
TiledMma: Stateless MMA computation on TileTensors. -
TileLoaderLDS: Cooperative global→LDS loader via load_to_lds.
Functions
-
load_lds_fragment: Load MMA fragments from SMEM to registers using hardware access pattern.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!