Mojo module
amd_matmul
Pure TileTensor structured AMD matmul kernel.
Uses RegTileLoader for DRAM to regs, blocked-product SMEM with Swizzle(3,0,1), StructuredMmaOp for per-k-tile MMA, and RegTileWriter for output. Schedule-driven pipeline via build_default_matmul_schedule.
Entry point: AMDMatmul.run()
comptime values
SCHED_MASK_DS_READ
comptime SCHED_MASK_DS_READ = 0
SCHED_MASK_DS_WRITE
comptime SCHED_MASK_DS_WRITE = 1
SCHED_MASK_MFMA
comptime SCHED_MASK_MFMA = 3
SCHED_MASK_VMEM_READ
comptime SCHED_MASK_VMEM_READ = 2
Structs
-
AMDMatmul: Pure TileTensor structured matmul for AMD GPUs.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!