For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

derive_waits_from_blocks

def derive_waits_from_blocks(program: PipelineProgram, config: PipelineConfig, lgkm_per_a: Int = Int(0), lgkm_per_b: Int = Int(0)) -> Tuple[Int, Int]

Derive wait counts from the finalized block structure.

Unlike the old derive_wait_counts (which operated on the flat LDG ordering before block construction), this works on the final PipelineProgram after CSP ordering AND post-construction redistribution. The counts always reflect the actual block layout.

Per-channel lgkm cost is read from the config first (config.lgkm_per_channel(channel)); the legacy lgkm_per_a/b parameters are only consulted when the config has them unset (= 0), preserving backward compat for callers that still thread them through ScheduleConfig.

wait_lgkm_first: lgkm ops in block 0's pre_ops (fragment loads issued before the first barrier/MMA). Ensures fragment loads complete before the MMA consumes their register values.

wait_vm_last: at the last block's pre_sync, all global loads from all blocks in this half have been issued (globals come before pre_sync in the block layout). Completion loads must have finished; prefetch loads may remain outstanding. wait_vm = total_vm_in_half - completion_vm.

Completion detection uses STAGE-BASED logic (not the k_offset-based global_load_prefetch flag, which serves the prologue). A load is completion if its stage matches the OTHER half's read stage (stage != half), because the other half's fragment loads will read from that LDS stage after the half-boundary barrier. A load to the SAME half's stage (stage == half) is prefetch — it won't be read until the next iteration of this half, so it can remain outstanding.

Returns (wait_lgkm_first, wait_vm_last).

Returns:

Tuple[Int, Int]