Mojo module
sync
comptime values
MAX_GPUS
comptime MAX_GPUS = 8
Maximum number of GPUs supported in the allreduce implementation.
This constant sets the upper bound for the number of GPUS supported in this algorithm.
MAX_NUM_BLOCKS_UPPER_BOUND
comptime MAX_NUM_BLOCKS_UPPER_BOUND = 512
Maximum number of thread blocks to use for reduction kernels.
This value has been empirically optimized through grid search across different GPU architectures. While this value is optimal for A100 GPUs, H100 GPUs may benefit from more blocks to fully saturate NVLink bandwidth.
Structs
-
Signal: A synchronization primitive for coordinating GPU thread blocks across multiple devices.
Functions
-
can_enable_p2p: If peer-to-peer access is supported, enables it between all GPU pairs. -
group_end: -
group_start:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!