IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

Pipeline4Wave

struct Pipeline4Wave[geometry: KernelGeometry]

4-wave pipeline schedule with cross-stage register rotation.

Returns the 24-op body in mini-iter order. Framework consumes that order verbatim under SchedulingStrategy.IDENTITY, so the final kernel emission matches the hand-written _run_iter body op-for-op (modulo wait-count derivation, which the framework handles via derive_waits_from_blocks).

Takes a KernelGeometry (kernel-shape-derived constants) as its only template parameter; replaces the previous [is_fp8, lgkm_a, lgkm_b] triple. lgkm_per_load_* is read directly from geometry, not threaded through ScheduleConfig.

Parameters​

  • ​geometry (KernelGeometry): Kernel-shape-derived constants (lgkm/vm costs, etc.).

Implemented traits​

AnyType, ImplicitlyDeletable, PipelineSchedule

Methods​

__init__​

def __init__(out self, config: ScheduleConfig = Pipeline4Wave._default_schedule_config(), target: TargetProfile = mi355x_target(Int(4), Int(4), Int(1)))

Constructs a Pipeline4Wave schedule with optional overrides.

Args:

  • ​config (ScheduleConfig): Schedule-level knobs (wait counts, barrier policy). Cross-stage-rotation invariants are re-applied even if the caller mutates them.
  • ​target (TargetProfile): Target hardware profile (defaults to MI355X).

config​

def config(self) -> PipelineConfig

Returns the underlying target PipelineConfig.

Returns:

PipelineConfig: The pipeline config from the target profile.

declare_ops​

def declare_ops(self) -> List[OpDesc]

Declares the logical 24-op body across both K-partitions.

Returns:

List[OpDesc]: The full list of OpDescs in mini-iter order.

build_body​

def build_body(self) -> List[OpDesc]

Annotates logical ops with target cost model.

Skips double_buffer_reorder β€” the body is already in mini-iter order and mma_block_interleave_list would break cross-stage frag placement (it matches frags to MMAs by subtile only, ignoring the frag's stage field, so it cannot distinguish a same-stage sub=0 frag from a cross-stage sub=0 frag).

Returns:

List[OpDesc]: The annotated list of OpDescs ready for compilation.

bootstrap_frags​

def bootstrap_frags(self) -> List[OpDesc]

Bootstraps A_quad[0] + B_quad[0] for the first main-loop iter.

The body's sub=0 frags read the cross stage as part of the cross-stage rotation pattern. For the very first main iter there's no previous half to have populated those quadrants, so we explicitly emit two same-stage sub=0 frag-loads here. The framework pairs each with a partial wait_vm drain (and a barrier) so each fires after exactly the prefetch it depends on completes β€” the remaining 6 prefetches stay in flight.

Returns:

List[OpDesc]: A 2-element list of A/B sub=0 frag-load OpDescs.

derive_edges​

def derive_edges(self, body: List[OpDesc]) -> List[DepEdge]

Derives dependency edges with cross-stage rotation fixups.

Runs the framework's default edge derivation, filters out the spurious same-partition FLOW edges that Phase 1 emits for cross-stage frags, then appends the cross-partition FLOW + same- partition ANTI edges. Both helpers live in pipeline.phase_derivation and are reusable across cross-stage rotation schedules.

Args:

  • ​body (List[OpDesc]): The annotated op list returned by build_body().

Returns:

List[DepEdge]: The complete list of dependency edges for wait derivation.

schedule_config​

def schedule_config(self) -> ScheduleConfig

Returns the schedule-level configuration for this pipeline.

Returns:

ScheduleConfig: The ScheduleConfig set up in __init__.

build_explicit_blocks​

def build_explicit_blocks(self, body: List[OpDesc], program: PipelineProgram) -> List[List[OpDesc]]

Emits each block via emit_minimal_barrier_block.

Same shape as the hand-tuned _run_iter's mini-iters: optional sched_barrier wrap + entry waits, frag/load section, optional sync-group wrap + pre_sync/barrier/post-barrier-lgkm, then the MMA.

Wait values, frag/load assignments, and barrier flags come from program.blocks[i] (populated by _construct_mma_blocks + auto-wait derivation). The schedule's only contribution is choosing the helper β€” the per-block ops are entirely framework-derived.

Args:

  • ​body (List[OpDesc]): The annotated op list (unused here; ops come from program.blocks).
  • ​program (PipelineProgram): The compiled pipeline program containing per-block wait counts and barrier flags.

Returns:

List[List[OpDesc]]: One inner list per block, each holding the ops emitted by emit_minimal_barrier_block.