Skip to main content
Log in

Mojo module

sync

This module includes intrinsics for NVIDIA GPUs sync instructions.

Functions

  • barrier: Performs a synchronization barrier on block (equivelent to __syncthreads in CUDA).
  • cp_async_bulk_commit_group: Commits all prior initiated but uncommitted cp.async.bulk instructions into a cp.async.bulk-group.
  • cp_async_bulk_wait_group: Causes the executing thread to wait until only N or fewer of the most recent bulk async-groups are pending and all the prior bulk async-groups committed by the executing threads are complete When N is 0, the executing thread waits on all the prior bulk async-groups to complete.
  • mbarrier: Makes the mbarrier object track all prior copy async operations initiated by the executing thread.
  • mbarrier_arrive: Commits the arrival of thead to a shared memory barrier.
  • mbarrier_arrive_expect_tx_shared: Performs an expect-tx operation on shared memory barrier.
  • mbarrier_init: Initialize shared memory barrier for N number of threads.
  • mbarrier_test_wait: Test waiting for the memory barrier.
  • mbarrier_try_wait_parity_shared: Waits for shared memory barrier till the completion of the phase or ticks expires.
  • syncwarp: Causes all threads to wait until all lanes specified by the warp mask reach the sync warp.