For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

greedy_schedule

def greedy_schedule(body: LoopBody) -> List[Int]

Constrained list scheduler for MMA-centered block structure.

Derives a valid execution order from the dependency graph, respecting structural constraints of the interleaved ping-pong kernel:

Half isolation: ops 0..num_ops/2 are scheduled first (blocks 0-3), then ops num_ops/2..num_ops (blocks 4-7). This preserves the two-half warp-group structure that build_program_from_ldg_ordered() requires (first 12 ops → blocks 0-3, last 12 → blocks 4-7).
Data dependencies: all d=0 predecessors (FLOW and ANTI) must be scheduled before the consumer. FLOW edges enforce RAW (register and accumulator) deps. ANTI edges enforce WAR (LDS buffer) deps: all mma_loads reading from an LDS buffer must complete before any prefetch global_load writes to that buffer.
MMA-centered blocks: each block terminates with exactly 1 MMA.
Priority: lowest op index wins among ready ops. This reproduces the declaration order from define_interleaved_loop_body(), which is the correct execution order. The ScheduleConfig wait counts (vmcnt, lgkmcnt) are calibrated for this specific ordering, so any reordering requires recalculating those counts.

Returns a permutation of [0, num_ops) suitable for build_program_from_ldg_ordered(). When ops are defined in execution order, this produces the identity permutation — validating that the dependency graph is sufficient to derive the schedule.

Returns:

List[Int]