Skip to main content
Log in

Mojo module

cluster

This module provides low-level NVIDIA GPU cluster synchronization primitives for SM90+ architectures.

The module implements thread block cluster operations that enable efficient communication and synchronization between thread blocks (CTAs) within a cluster on NVIDIA Hopper architecture and newer GPUs.

All functions are constrained to NVIDIA SM90+ GPUs and will raise an error if used on unsupported hardware.

Note: These are low-level primitives that correspond directly to PTX/NVVM instructions and should be used with careful consideration of the underlying hardware synchronization mechanisms.

Functions

  • block_rank_in_cluster: Returns the unique identifier (rank) for the current thread block within its cluster.
  • cluster_arrive: Signals arrival at a cluster synchronization point with memory ordering guarantees.
  • cluster_arrive_relaxed: Signals arrival at a cluster synchronization point with relaxed memory ordering.
  • cluster_sync: Performs a full cluster synchronization with memory ordering guarantees.
  • cluster_sync_relaxed: Performs a full cluster synchronization with relaxed memory ordering.
  • cluster_wait: Waits for all thread blocks in the cluster to arrive at the synchronization point.
  • elect_one_sync: Elects a single thread within a warp to perform an operation.