Mojo struct
NamedBarrierSemaphore
@register_passable(trivial)
struct NamedBarrierSemaphore[thread_count: Int32, id_offset: Int32, max_num_barriers: Int32]
A device-wide semaphore implementation for NVIDIA GPUs with named barriers.
It's using an acquire-release logic instead of atomic instructions for inter-CTA synchronization with a shared lock variable. Please note that the memory barrier is for syncing warp groups within in a CTA. Cutlass reference implementation: https://github.com/NVIDIA/cutlass/blob/a1aaf2300a8fc3a8106a05436e1a2abad0930443/include/cutlass/arch/barrier.h.
Parameters
- thread_count (
Int32): Number of threads participating in the barrier. - id_offset (
Int32): Offset for the barrier ID. - max_num_barriers (
Int32): Maximum number of named barriers to use.
Implemented traits
AnyType,
Copyable,
ImplicitlyCopyable,
Movable,
UnknownDestructibility
Aliases
__copyinit__is_trivial
alias __copyinit__is_trivial = True
__del__is_trivial
alias __del__is_trivial = True
__moveinit__is_trivial
alias __moveinit__is_trivial = True
Methods
__init__
__init__(lock: UnsafePointer[Int32], thread_id: Int) -> Self
Initialize a new Semaphore instance.
Args:
- lock (
UnsafePointer): Pointer to shared lock variable in global memory. - thread_id (
Int): Thread ID within the CTA, used to determine if this thread should perform atomic operations.
state
state(self) -> Int32
Get the current state of the semaphore.
Returns:
Int32: The current state value of the semaphore.
wait_eq
wait_eq(mut self, id: Int32, status: Int32 = 0)
Waits until the semaphore state equals the specified status.
Args:
wait_lt
wait_lt(mut self, id: Int32, count: Int32 = 0)
Waits until the semaphore state is less than the specified count.
Args:
arrive_set
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!