Skip to main content
Log in

Mojo function

multi_gpu_barrier

multi_gpu_barrier[ngpus: Int, is_start: Bool, need_fence: Bool = False](rank_sigs: StaticTuple[UnsafePointer[Signal], 8], self_sg: UnsafePointer[Signal], my_rank: Int)

Implements a barrier synchronization across multiple GPUs.

Arguments: rank_sigs: Signal pointers for all GPUs self_sg: Signal pointer for current GPU my_rank: Current GPU rank

Uses atomic counters and memory fences to ensure all GPUs reach barrier before proceeding. Implementation ported from VLLM's multi_gpu_barrier in https://github.com/vllm-project/vllm/blob/main/csrc/custom_all_reduce.cuh#L169-L198

Parameters:

  • ngpus (Int): Int - Number of GPUs participating in barrier.
  • is_start (Bool): Bool - Whether this is the start barrier.
  • need_fence (Bool): Bool - Whether memory fence is needed.