Skip to main content

Mojo module

matmul_schedule

Declarative software pipeline schedule for the default AMD matmul kernel.

This module defines the Loop Dependency Graph (LDG), schedule builder, and schedule hint derivation for the single-buffer matmul pipeline in matmul.mojo.

Architecture (single-buffer, barrier-gated pipeline):

Prologue: load_dram → store_smem → barrier → load_dram(prefetch) → load_frag[0]

Kernel body (per K-loop iteration, num_k_tiles=T): load_frag[1..T-1], compute[0], barrier, store_smem, load_dram(prefetch), compute[1..T-1], barrier, load_frag[0], schedule_group_barrier hints

Epilogue (2 drain iterations): Drain 1: load_frag[1..T-1], barrier, store_smem, compute[0..T-1] Drain 2: barrier, load_frag[0..T-1], compute[0..T-1]

Key differences from ping-pong matmul:

  • Single SMEM buffer (barriers gate read/write phases, no double-buffering)
  • All warps identical (no warp groups or stagger)
  • Bundled ops: load_dram=A+B, load_frag=A+B, store_smem=A+B
  • Iterator-based K advancement (no KOffsetKind)
  • schedule_group_barrier hints instead of schedule_barrier fences

comptime values

COMPUTE

comptime COMPUTE = DefaultMatmulOps.COMPUTE.value

LOAD_DRAM

comptime LOAD_DRAM = DefaultMatmulOps.LOAD_DRAM.value

LOAD_FRAG

comptime LOAD_FRAG = DefaultMatmulOps.LOAD_FRAG.value

STORE_SMEM

comptime STORE_SMEM = DefaultMatmulOps.STORE_SMEM.value

Structs

Functions

Was this page helpful?