Mojo struct
Pipeline4Wave
struct Pipeline4Wave[geometry: KernelGeometry]
4-wave pipeline schedule with cross-stage register rotation.
Returns the 24-op body in mini-iter order. Framework consumes that
order verbatim under SchedulingStrategy.IDENTITY, so the final
kernel emission matches the hand-written _run_iter body op-for-op
(modulo wait-count derivation, which the framework handles via
derive_waits_from_blocks).
Takes a KernelGeometry (kernel-shape-derived constants) as its
only template parameter; replaces the previous [is_fp8, lgkm_a, lgkm_b] triple. lgkm_per_load_* is read directly from geometry,
not threaded through ScheduleConfig.
Parametersβ
- βgeometry (
KernelGeometry): Kernel-shape-derived constants (lgkm/vm costs, etc.).
Implemented traitsβ
AnyType,
ImplicitlyDestructible,
PipelineSchedule
Methodsβ
__init__β
__init__(out self, config: ScheduleConfig = Pipeline4Wave._default_schedule_config(), target: TargetProfile = mi355x_target(4, 4, 1))
Constructs a Pipeline4Wave schedule with optional overrides.
Args:
- βconfig (
ScheduleConfig): Schedule-level knobs (wait counts, barrier policy). Cross-stage-rotation invariants are re-applied even if the caller mutates them. - βtarget (
TargetProfile): Target hardware profile (defaults to MI355X).
configβ
config(self) -> PipelineConfig
Returns the underlying target PipelineConfig.
Returns:
PipelineConfig: The pipeline config from the target profile.
declare_opsβ
declare_ops(self) -> List[OpDesc]
Declares the logical 24-op body across both K-partitions.
Returns:
List[OpDesc]: The full list of OpDescs in mini-iter order.
build_bodyβ
build_body(self) -> List[OpDesc]
Annotates logical ops with target cost model.
Skips double_buffer_reorder β the body is already in mini-iter
order and mma_block_interleave_list would break cross-stage
frag placement (it matches frags to MMAs by subtile only,
ignoring the frag's stage field, so it cannot distinguish a
same-stage sub=0 frag from a cross-stage sub=0 frag).
Returns:
List[OpDesc]: The annotated list of OpDescs ready for compilation.
bootstrap_fragsβ
bootstrap_frags(self) -> List[OpDesc]
Bootstraps A_quad[0] + B_quad[0] for the first main-loop iter.
The body's sub=0 frags read the cross stage as part of the
cross-stage rotation pattern. For the very first main iter
there's no previous half to have populated those quadrants, so
we explicitly emit two same-stage sub=0 frag-loads here. The
framework pairs each with a partial wait_vm drain (and a
barrier) so each fires after exactly the prefetch it depends
on completes β the remaining 6 prefetches stay in flight.
Returns:
List[OpDesc]: A 2-element list of A/B sub=0 frag-load OpDescs.
derive_edgesβ
derive_edges(self, body: List[OpDesc]) -> List[DepEdge]
Derives dependency edges with cross-stage rotation fixups.
Runs the framework's default edge derivation, filters out the
spurious same-partition FLOW edges that Phase 1 emits for
cross-stage frags, then appends the cross-partition FLOW + same-
partition ANTI edges. Both helpers live in
pipeline.phase_derivation and are reusable across cross-stage
rotation schedules.
Args:
- βbody (
List[OpDesc]): The annotated op list returned bybuild_body().
Returns:
List[DepEdge]: The complete list of dependency edges for wait derivation.
schedule_configβ
schedule_config(self) -> ScheduleConfig
Returns the schedule-level configuration for this pipeline.
Returns:
ScheduleConfig: The ScheduleConfig set up in __init__.
build_explicit_blocksβ
build_explicit_blocks(self, body: List[OpDesc], program: PipelineProgram) -> List[List[OpDesc]]
Emits each block via emit_minimal_barrier_block.
Same shape as the hand-tuned _run_iter's mini-iters: optional
sched_barrier wrap + entry waits, frag/load section, optional
sync-group wrap + pre_sync/barrier/post-barrier-lgkm, then the
MMA.
Wait values, frag/load assignments, and barrier flags come
from program.blocks[i] (populated by
_construct_mma_blocks + auto-wait derivation). The schedule's
only contribution is choosing the helper β the per-block ops
are entirely framework-derived.
Args:
- βbody (
List[OpDesc]): The annotated op list (unused here; ops come fromprogram.blocks). - βprogram (
PipelineProgram): The compiled pipeline program containing per-block wait counts and barrier flags.
Returns:
List[List[OpDesc]]: One inner list per block, each holding the ops emitted by
emit_minimal_barrier_block.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!