IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

greedy_schedule

greedy_schedule(body: LoopBody) -> List[Int]

Constrained list scheduler for MMA-centered block structure.

Derives a valid execution order from the dependency graph, respecting structural constraints of the interleaved ping-pong kernel:

  1. Half isolation: ops 0..num_ops/2 are scheduled first (blocks 0-3), then ops num_ops/2..num_ops (blocks 4-7). This preserves the two-half warp-group structure that build_program_from_ldg_ordered() requires (first 12 ops โ†’ blocks 0-3, last 12 โ†’ blocks 4-7).
  2. Data dependencies: all d=0 predecessors (FLOW and ANTI) must be scheduled before the consumer. FLOW edges enforce RAW (register and accumulator) deps. ANTI edges enforce WAR (LDS buffer) deps: all mma_loads reading from an LDS buffer must complete before any prefetch global_load writes to that buffer.
  3. MMA-centered blocks: each block terminates with exactly 1 MMA.
  4. Priority: lowest op index wins among ready ops. This reproduces the declaration order from define_interleaved_loop_body(), which is the correct execution order. The ScheduleConfig wait counts (vmcnt, lgkmcnt) are calibrated for this specific ordering, so any reordering requires recalculating those counts.

Returns a permutation of [0, num_ops) suitable for build_program_from_ldg_ordered(). When ops are defined in execution order, this produces the identity permutation โ€” validating that the dependency graph is sufficient to derive the schedule.

Returns:

List[Int]