IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

derive_waits_from_blocks

derive_waits_from_blocks(program: PipelineProgram, config: PipelineConfig, lgkm_per_a: Int = 0, lgkm_per_b: Int = 0) -> Tuple[Int, Int]

Derive wait counts from the finalized block structure.

Unlike the old derive_wait_counts (which operated on the flat LDG ordering before block construction), this works on the final PipelineProgram after CSP ordering AND post-construction redistribution. The counts always reflect the actual block layout.

Per-channel lgkm cost is read from the config first (config.lgkm_per_channel(channel)); the legacy lgkm_per_a/b parameters are only consulted when the config has them unset (= 0), preserving backward compat for callers that still thread them through ScheduleConfig.

wait_lgkm_first: lgkm ops in block 0's pre_ops (fragment loads issued before the first barrier/MMA). Ensures fragment loads complete before the MMA consumes their register values.

wait_vm_last: at the last block's pre_sync, all global loads from all blocks in this half have been issued (globals come before pre_sync in the block layout). Completion loads must have finished; prefetch loads may remain outstanding. wait_vm = total_vm_in_half - completion_vm.

Completion detection uses STAGE-BASED logic (not the k_offset-based global_load_prefetch flag, which serves the prologue). A load is completion if its stage matches the OTHER half's read stage (stage != half), because the other half's fragment loads will read from that LDS stage after the half-boundary barrier. A load to the SAME half's stage (stage == half) is prefetch — it won't be read until the next iteration of this half, so it can remain outstanding.

Returns (wait_lgkm_first, wait_vm_last).

Returns:

Tuple[Int, Int]