Mojo function
multi_gpu_barrier
multi_gpu_barrier[ngpus: Int, is_start: Bool, need_fence: Bool = False](rank_sigs: StaticTuple[UnsafePointer[Signal], 8], self_sg: UnsafePointer[Signal], my_rank: Int)
Implements a barrier synchronization across multiple GPUs.
Arguments: rank_sigs: Signal pointers for all GPUs self_sg: Signal pointer for current GPU my_rank: Current GPU rank
Uses atomic counters and memory fences to ensure all GPUs reach barrier before proceeding. Implementation ported from VLLM's multi_gpu_barrier in https://github.com/vllm-project/vllm/blob/main/csrc/custom_all_reduce.cuh#L169-L198
Parameters:
- ngpus (
Int
): Int - Number of GPUs participating in barrier. - is_start (
Bool
): Bool - Whether this is the start barrier. - need_fence (
Bool
): Bool - Whether memory fence is needed.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!