For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

amd_matmul_schedule

Declarative software pipeline schedule for the default AMD matmul kernel.

This module defines the Loop Dependency Graph (LDG), schedule builder, and schedule hint derivation for the single-buffer matmul pipeline in matmul.mojo.

Architecture (single-buffer, barrier-gated pipeline):

Prologue: load_dram → store_smem → barrier → load_dram(prefetch) → load_frag[0]

Kernel body (per K-loop iteration, num_k_tiles=T): load_frag[1..T-1], compute[0], barrier, store_smem, load_dram(prefetch), compute[1..T-1], barrier, load_frag[0], schedule_group_barrier hints

Epilogue (2 drain iterations): Drain 1: load_frag[1..T-1], barrier, store_smem, compute[0..T-1] Drain 2: barrier, load_frag[0..T-1], compute[0..T-1]

Key differences from ping-pong matmul:

Single SMEM buffer (barriers gate read/write phases, no double-buffering)
All warps identical (no warp groups or stagger)
Bundled ops: load_dram=A+B, load_frag=A+B, store_smem=A+B
Iterator-based K advancement (no KOffsetKind)
schedule_group_barrier hints instead of schedule_barrier fences

`comptime` values

`COMPUTE`

comptime COMPUTE = DefaultMatmulOps.COMPUTE.value

`LOAD_DRAM`

comptime LOAD_DRAM = DefaultMatmulOps.LOAD_DRAM.value

`LOAD_FRAG`

comptime LOAD_FRAG = DefaultMatmulOps.LOAD_FRAG.value

`STORE_SMEM`

comptime STORE_SMEM = DefaultMatmulOps.STORE_SMEM.value

Structs

DefaultMatmulOps: Op tags for the default single-buffer matmul kernel.
SingleBufferSchedule: Declarative schedule for the default single-buffer matmul.

Functions

build_default_matmul_schedule: Build the complete software pipeline schedule for the default matmul.
compute_range: Build a Pipe of compute ops for k-tiles start..end-1.
load_frags: Build a Pipe of load_frag ops for k-tiles start..end-1.

comptime values​

COMPUTE​

LOAD_DRAM​

LOAD_FRAG​

STORE_SMEM​

Structs​

Functions​