IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

amd_matmul_schedule

Declarative software pipeline schedule for the default AMD matmul kernel.

This module defines the Loop Dependency Graph (LDG), schedule builder, and schedule hint derivation for the single-buffer matmul pipeline in matmul.mojo.

Architecture (single-buffer, barrier-gated pipeline):

Prologue: load_dram β†’ store_smem β†’ barrier β†’ load_dram(prefetch) β†’ load_frag[0]

Kernel body (per K-loop iteration, num_k_tiles=T): load_frag[1..T-1], compute[0], barrier, store_smem, load_dram(prefetch), compute[1..T-1], barrier, load_frag[0], schedule_group_barrier hints

Epilogue (2 drain iterations): Drain 1: load_frag[1..T-1], barrier, store_smem, compute[0..T-1] Drain 2: barrier, load_frag[0..T-1], compute[0..T-1]

Key differences from ping-pong matmul:

  • Single SMEM buffer (barriers gate read/write phases, no double-buffering)
  • All warps identical (no warp groups or stagger)
  • Bundled ops: load_dram=A+B, load_frag=A+B, store_smem=A+B
  • Iterator-based K advancement (no KOffsetKind)
  • schedule_group_barrier hints instead of schedule_barrier fences

comptime values​

COMPUTE​

comptime COMPUTE = DefaultMatmulOps.COMPUTE.value

LOAD_DRAM​

comptime LOAD_DRAM = DefaultMatmulOps.LOAD_DRAM.value

LOAD_FRAG​

comptime LOAD_FRAG = DefaultMatmulOps.LOAD_FRAG.value

STORE_SMEM​

comptime STORE_SMEM = DefaultMatmulOps.STORE_SMEM.value

Structs​

Functions​