Mojo module
cluster
This module provides low-level NVIDIA GPU cluster synchronization primitives for SM90+ architectures.
The module implements thread block cluster operations that enable efficient communication and synchronization between thread blocks (CTAs) within a cluster on NVIDIA Hopper architecture and newer GPUs.
All functions are constrained to NVIDIA SM90+ GPUs and will raise an error if used on unsupported hardware.
Note: These are low-level primitives that correspond directly to PTX/NVVM instructions and should be used with careful consideration of the underlying hardware synchronization mechanisms.
Functions
-
block_rank_in_cluster
: Returns the unique identifier (rank) for the current thread block within its cluster. -
cluster_arrive
: Signals arrival at a cluster synchronization point with memory ordering guarantees. -
cluster_arrive_relaxed
: Signals arrival at a cluster synchronization point with relaxed memory ordering. -
cluster_sync
: Performs a full cluster synchronization with memory ordering guarantees. -
cluster_sync_relaxed
: Performs a full cluster synchronization with relaxed memory ordering. -
cluster_wait
: Waits for all thread blocks in the cluster to arrive at the synchronization point. -
elect_one_sync
: Elects a single thread within a warp to perform an operation.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!