Skip to main content

Mojo struct

LoadOrderBarrier

struct LoadOrderBarrier

Barrier for coordinating mainloop load and epilogue load warps.

This barrier implements a simple producer-consumer pattern where the mainloop load warp (producer) signals after completing prologue loads, and the epilogue load warp (consumer) waits before starting C loads.

Protocol:

  1. Mainloop load warp issues prologue TMA loads
  2. Mainloop load warp calls arrive()
  3. Epilogue load warp calls wait() before starting
  4. Epilogue load warp can now issue TMA loads without contention

This prevents TMA resource contention and ensures proper ordering.

Phase Tracking

The barrier uses a single phase bit that toggles per tile iteration. This allows proper synchronization across multiple output tiles.

Fields

  • barrier (MbarPtr):
  • phase (UInt32):

Implemented traits

AnyType, ImplicitlyDestructible

comptime members

__del__is_trivial

comptime __del__is_trivial = True

Methods

__init__

__init__(out self, ptr: LegacyUnsafePointer[SharedMemBarrier, address_space=AddressSpace.SHARED], initial_phase: UInt32 = 0)

Initialize the load order barrier.

Args:

init

init(self, arrive_count: Int32 = 1)

Initialize the barrier.

Should be called by a single thread (elect_one_thread) during kernel initialization.

Args:

  • arrive_count (Int32): Number of arrives to expect (default 1 for single mainloop load warp).

arrive

arrive(self)

Signal that mainloop prologue loads are complete.

Called by the mainloop load warp after issuing prologue TMA loads.

wait

wait(self)

Wait for mainloop to signal prologue completion.

Called by the epilogue load warp before starting C loads.

step

step(mut self)

Toggle phase for next tile iteration.

Called after both arrive and wait have completed to prepare for the next output tile's synchronization.

arrive_and_step

arrive_and_step(mut self)

Arrive and advance phase in one call.

Convenience method for mainloop load warp:

load_order_barrier.arrive_and_step()

wait_and_step

wait_and_step(mut self)

Wait and advance phase in one call.

Convenience method for epilogue load warp:

load_order_barrier.wait_and_step()

Was this page helpful?