Skip to main content

Mojo module

amd_matmul

Pure TileTensor structured AMD matmul kernel.

Uses RegTileLoader for DRAM to regs, blocked-product SMEM with Swizzle(3,0,1), StructuredMmaOp for per-k-tile MMA, and RegTileWriter for output. Schedule-driven pipeline via build_default_matmul_schedule.

Entry point: AMDMatmul.run()

comptime values

SCHED_MASK_DS_READ

comptime SCHED_MASK_DS_READ = 0

SCHED_MASK_DS_WRITE

comptime SCHED_MASK_DS_WRITE = 1

SCHED_MASK_MFMA

comptime SCHED_MASK_MFMA = 3

SCHED_MASK_VMEM_READ

comptime SCHED_MASK_VMEM_READ = 2

Structs

  • AMDMatmul: Pure TileTensor structured matmul for AMD GPUs.

Was this page helpful?