Mojo module
matmul_schedule
Declarative software pipeline schedule for the default AMD matmul kernel.
This module defines the Loop Dependency Graph (LDG), schedule builder, and schedule hint derivation for the single-buffer matmul pipeline in matmul.mojo.
Architecture (single-buffer, barrier-gated pipeline):
Prologue: load_dram → store_smem → barrier → load_dram(prefetch) → load_frag[0]
Kernel body (per K-loop iteration, num_k_tiles=T): load_frag[1..T-1], compute[0], barrier, store_smem, load_dram(prefetch), compute[1..T-1], barrier, load_frag[0], schedule_group_barrier hints
Epilogue (2 drain iterations): Drain 1: load_frag[1..T-1], barrier, store_smem, compute[0..T-1] Drain 2: barrier, load_frag[0..T-1], compute[0..T-1]
Key differences from ping-pong matmul:
- Single SMEM buffer (barriers gate read/write phases, no double-buffering)
- All warps identical (no warp groups or stagger)
- Bundled ops: load_dram=A+B, load_frag=A+B, store_smem=A+B
- Iterator-based K advancement (no KOffsetKind)
- schedule_group_barrier hints instead of schedule_barrier fences
comptime values
COMPUTE
comptime COMPUTE = DefaultMatmulOps.COMPUTE.value
LOAD_DRAM
comptime LOAD_DRAM = DefaultMatmulOps.LOAD_DRAM.value
LOAD_FRAG
comptime LOAD_FRAG = DefaultMatmulOps.LOAD_FRAG.value
STORE_SMEM
comptime STORE_SMEM = DefaultMatmulOps.STORE_SMEM.value
Structs
-
DefaultMatmulOps: Op tags for the default single-buffer matmul kernel. -
SingleBufferSchedule: Declarative schedule for the default single-buffer matmul.
Functions
-
build_default_matmul_schedule: Build the complete software pipeline schedule for the default matmul. -
compute_range: Build a Pipe of compute ops for k-tiles start..end-1. -
load_frags: Build a Pipe of load_frag ops for k-tiles start..end-1.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!