Skip to main content

Mojo struct

NamedBarrierSemaphore

@register_passable(trivial) struct NamedBarrierSemaphore[thread_count: SIMD[int32, 1], id_offset: SIMD[int32, 1], max_num_barriers: SIMD[int32, 1]]

A device-wide semaphore implementation for NVIDIA GPUs with named barriers.

It's using an acquire-release logic instead of atomic instructions for inter-CTA synchronization with a shared lock variable. Please note that the memory barrier is for syncing warp groups within in a CTA. (Cutlass reference implementation)[https://github.com/NVIDIA/cutlass/blob/a1aaf2300a8fc3a8106a05436e1a2abad0930443/include/cutlass/arch/barrier.h].

Implemented traits

AnyType, Copyable, Movable, UnknownDestructibility

Methods

__init__

__init__(lock: UnsafePointer[SIMD[int32, 1]], thread_id: Int) -> Self

Initialize a new Semaphore instance.

Args:

  • lock (UnsafePointer[SIMD[int32, 1]]): Pointer to shared lock variable in global memory.
  • thread_id (Int): Thread ID within the CTA, used to determine if this thread should perform atomic operations.

state

state(self) -> SIMD[int32, 1]

Get the current state of the semaphore.

Returns:

The current state value of the semaphore.

wait_eq

wait_eq(mut self, id: SIMD[int32, 1], status: SIMD[int32, 1] = 0)

wait_lt

wait_lt(mut self, id: SIMD[int32, 1], count: SIMD[int32, 1] = 0)

arrive_set

arrive_set(self, id: SIMD[int32, 1], status: SIMD[int32, 1] = 0)

Was this page helpful?