Mojo module
sync
This module provides GPU synchronization primitives and barriers.
The module includes:
- Block-level synchronization barriers (barrier())
- Warp-level synchronization (syncwarp())
- Memory barriers (mbarrier) for NVIDIA GPUs
- Instruction scheduling controls for AMD GPUs
- Asynchronous copy and bulk transfer synchronization
The synchronization primitives help coordinate execution between threads within thread blocks and warps, and manage memory consistency across different memory spaces.
Structs
-
AMDScheduleBarrierMask
: Represents different instruction scheduling masks for AMDGPU scheduling instructions.
Functions
-
async_copy_arrive
: Makes a memory barrier track all prior async copy operations from this thread. -
barrier
: Performs a synchronization barrier at the block level. -
cp_async_bulk_commit_group
: Commits all prior initiated but uncommitted cp.async.bulk instructions into a cp.async.bulk-group. -
cp_async_bulk_wait_group
: Waits for completion of asynchronous bulk memory transfer groups. -
mbarrier_arrive
: Signal thread arrival at a shared memory barrier. -
mbarrier_arrive_expect_tx_shared
: Configure a shared memory barrier to expect additional async transactions. -
mbarrier_init
: Initialize a shared memory barrier for synchronizing multiple threads. -
mbarrier_test_wait
: Test if all threads have arrived at the memory barrier. -
mbarrier_try_wait_parity_shared
: Wait for completion of a barrier phase with timeout. -
schedule_barrier
: Controls instruction scheduling across a barrier point in AMD GPU code. -
schedule_group_barrier
: Controls instruction scheduling across a barrier point in AMD GPU code by creating schedule groups. -
syncwarp
: Synchronizes threads within a warp using a barrier.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!