Mojo struct
NamedBarrierSemaphore
@register_passable(trivial)
struct NamedBarrierSemaphore[thread_count: SIMD[int32, 1], id_offset: SIMD[int32, 1], max_num_barriers: SIMD[int32, 1]]
A device-wide semaphore implementation for NVIDIA GPUs with named barriers.
It's using an acquire-release logic instead of atomic instructions for inter-CTA synchronization with a shared lock variable. Please note that the memory barrier is for syncing warp groups within in a CTA. (Cutlass reference implementation)[https://github.com/NVIDIA/cutlass/blob/a1aaf2300a8fc3a8106a05436e1a2abad0930443/include/cutlass/arch/barrier.h].
Implemented traits
AnyType
,
Copyable
,
Movable
,
UnknownDestructibility
Methods
__init__
__init__(lock: UnsafePointer[SIMD[int32, 1]], thread_id: Int) -> Self
Initialize a new Semaphore instance.
Args:
- lock (
UnsafePointer[SIMD[int32, 1]]
): Pointer to shared lock variable in global memory. - thread_id (
Int
): Thread ID within the CTA, used to determine if this thread should perform atomic operations.
state
state(self) -> SIMD[int32, 1]
Get the current state of the semaphore.
Returns:
The current state value of the semaphore.
wait_eq
wait_eq(mut self, id: SIMD[int32, 1], status: SIMD[int32, 1] = 0)
wait_lt
wait_lt(mut self, id: SIMD[int32, 1], count: SIMD[int32, 1] = 0)
arrive_set
arrive_set(self, id: SIMD[int32, 1], status: SIMD[int32, 1] = 0)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!