Mojo module
all_reduce
Aliases
-
flag_t = uint32
: -
MAX_BLOCK = 36
: -
MAX_GPUS = 8
:
Structs
Functions
-
all_reduce
: Main entry point for performing all-reduce across multiple GPUs. -
all_reduce_naive
: Performs all-reduce across GPUs without using peer-to-peer access. -
all_reduce_p2p
: Performs all-reduce using peer-to-peer access between GPUs. -
all_reduce_p2p_kernel
: Kernel implementing all-reduce using peer-to-peer access between GPUs. -
can_enable_p2p
: Checks and enables peer-to-peer access between all GPU pairs. -
multi_gpu_barrier
: Implements a barrier synchronization across multiple GPUs. -
naive_reduce_kernel
: A simple reduction kernel that adds source buffer values to destination buffer.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!