Mojo struct
LoadOrderBarrier
struct LoadOrderBarrier
Barrier for coordinating mainloop load and epilogue load warps.
This barrier implements a simple producer-consumer pattern where the mainloop load warp (producer) signals after completing prologue loads, and the epilogue load warp (consumer) waits before starting C loads.
Protocol:
- Mainloop load warp issues prologue TMA loads
- Mainloop load warp calls arrive()
- Epilogue load warp calls wait() before starting
- Epilogue load warp can now issue TMA loads without contention
This prevents TMA resource contention and ensures proper ordering.
Phase Tracking
The barrier uses a single phase bit that toggles per tile iteration. This allows proper synchronization across multiple output tiles.
Fields
- barrier (
MbarPtr): - phase (
UInt32):
Implemented traits
AnyType,
ImplicitlyDestructible
comptime members
__del__is_trivial
comptime __del__is_trivial = True
Methods
__init__
__init__(out self, ptr: LegacyUnsafePointer[SharedMemBarrier, address_space=AddressSpace.SHARED], initial_phase: UInt32 = 0)
Initialize the load order barrier.
Args:
- ptr (
LegacyUnsafePointer): Pointer to shared memory barrier. - initial_phase (
UInt32): Initial phase (default 0).
init
init(self, arrive_count: Int32 = 1)
Initialize the barrier.
Should be called by a single thread (elect_one_thread) during kernel initialization.
Args:
- arrive_count (
Int32): Number of arrives to expect (default 1 for single mainloop load warp).
arrive
arrive(self)
Signal that mainloop prologue loads are complete.
Called by the mainloop load warp after issuing prologue TMA loads.
wait
wait(self)
Wait for mainloop to signal prologue completion.
Called by the epilogue load warp before starting C loads.
step
step(mut self)
Toggle phase for next tile iteration.
Called after both arrive and wait have completed to prepare for the next output tile's synchronization.
arrive_and_step
arrive_and_step(mut self)
Arrive and advance phase in one call.
Convenience method for mainloop load warp:
load_order_barrier.arrive_and_step()wait_and_step
wait_and_step(mut self)
Wait and advance phase in one call.
Convenience method for epilogue load warp:
load_order_barrier.wait_and_step()Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!