IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

MMABlockSpec

struct MMABlockSpec

Declarative specification for one MMA block in a pipeline schedule.

Each MMA block follows a fixed pattern with optional elements: [pre_op_0?], [pre_op_1?], [global_load?], [global_load_1?], [pre_sync?], barrier, [schedule_barrier?], [wait_lgkm(0)?], set_prio(1), mma, [fused_mma?], set_prio(0), barrier, [schedule_barrier?]

Optional elements are controlled by sentinel values:

  • OpDesc fields: _Ops.NONE.value means skip
  • Bool flags: False means skip the corresponding element

All operations are stored as pre-built OpDesc values, avoiding any runtime-to-comptime conversion.

Fields​

  • ​mma (OpDesc):
  • ​pre_op_0 (OpDesc):
  • ​pre_op_1 (OpDesc):
  • ​global_load (OpDesc):
  • ​global_load_1 (OpDesc):
  • ​entry_wait (OpDesc): wait_vm op emitted at the start of the block (or OpDesc.none()).
  • ​entry_wait_lgkm (OpDesc): wait_lgkm op emitted at the start of the block (or OpDesc.none()).
  • ​pre_sync (OpDesc):
  • ​fused_mma (OpDesc):
  • ​pre_mma_barrier (Bool): Emit an s_barrier before the MMA op.
  • ​pre_mma_set_prio (Bool): Emit s_setprio[1] before the MMA op.
  • ​post_mma_barrier (Bool): Emit an s_barrier after the MMA op.
  • ​post_mma_set_prio (Bool): Emit s_setprio[0] after the MMA op.
  • ​post_barrier_lgkm (Bool):
  • ​post_barrier_sched (Bool):
  • ​trailing_sched_barrier (Bool):
  • ​global_load_prefetch (Bool):
  • ​global_load_1_prefetch (Bool):
  • ​drain_lgkm_before_loads (Bool):
  • ​global_before_frag (Bool): Swaps the in-block order of global loads and fragment loads.
  • ​barrier_before_pre_ops (Bool): Moves the pre_sync + barrier section ahead of the frag/global section.
  • ​wrap_waits_with_sched_barrier (Bool): Wraps each contiguous wait/barrier group with schedule_barrier on both sides as an LLVM machine-scheduler fence.

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable

Methods​

__init__​

__init__(out self, *, mma: OpDesc, pre_op_0: OpDesc = OpDesc.none(), pre_op_1: OpDesc = OpDesc.none(), global_load: OpDesc = OpDesc.none(), global_load_1: OpDesc = OpDesc.none(), entry_wait: OpDesc = OpDesc.none(), entry_wait_lgkm: OpDesc = OpDesc.none(), pre_sync: OpDesc = OpDesc.none(), fused_mma: OpDesc = OpDesc.none(), pre_mma_barrier: Bool = True, pre_mma_set_prio: Bool = True, post_mma_barrier: Bool = True, post_mma_set_prio: Bool = True, post_barrier_lgkm: Bool = True, post_barrier_sched: Bool = False, trailing_sched_barrier: Bool = False, global_load_prefetch: Bool = False, global_load_1_prefetch: Bool = False, drain_lgkm_before_loads: Bool = False, global_before_frag: Bool = False, barrier_before_pre_ops: Bool = False, wrap_waits_with_sched_barrier: Bool = False)

entry_count​

entry_count(self) -> Int

Count the number of schedule entries this block will expand to.

With all flags True (the default ping-pong layout), 5 mandatory entries (barrier, set_prio(1), mma, set_prio(0), barrier) plus one for each optional field present. The pre_mma_* / post_mma_* Bool flags let minimal_barriers schedules suppress unneeded sync ops.

Returns:

Int

mma_position​

mma_position(self) -> Int

Return the offset of the MMA op within this block's entries.

Returns:

Int

post_barrier_lgkm_position​

post_barrier_lgkm_position(self) -> Int

Return the offset of the post-barrier wait_lgkm(0) within entries.

Returns -1 if post_barrier_lgkm is False or the pre-MMA barrier is suppressed.

Returns:

Int

expand​

expand[N: Int, phase: Phase](self, mut b: EntryBuilder[N, phase])

Expand this block spec into schedule entries via an EntryBuilder.

Emission shape is controlled by two per-block flags:

  • global_before_frag swaps the order of pre_op_* (frags) and global_load* (DRAMβ†’LDS prefetches).
  • barrier_before_pre_ops moves the pre_sync + barrier section ahead of the frag/global section.

expand_to_list​

expand_to_list(self, mut out: List[ScheduleEntry], phase: Phase)

Expand this block spec by appending to a List.

Emission shape mirrors expand; see that method for the per-block ordering flags.