Skip to main content

Mojo module

amd_matmul

Pure TileTensor structured AMD matmul kernel.

Uses RegTileLoader for DRAM to regs, blocked-product SMEM with Swizzle(3,0,1), StructuredMmaOp for per-k-tile MMA, and RegTileWriter for output. Schedule-driven pipeline via build_default_matmul_schedule.

Entry point: AMDMatmul.run()

comptime values​

SCHED_MASK_DS_READ​

comptime SCHED_MASK_DS_READ = 0

SCHED_MASK_DS_WRITE​

comptime SCHED_MASK_DS_WRITE = 1

SCHED_MASK_MFMA​

comptime SCHED_MASK_MFMA = 3

SCHED_MASK_VMEM_READ​

comptime SCHED_MASK_VMEM_READ = 2

Structs​