Skip to main content

Mojo module

sync

comptime values

MAX_GPUS

comptime MAX_GPUS = 8

Maximum number of GPUs supported in the allreduce implementation.

This constant sets the upper bound for the number of GPUS supported in this algorithm.

MAX_NUM_BLOCKS_UPPER_BOUND

comptime MAX_NUM_BLOCKS_UPPER_BOUND = 512

Maximum number of thread blocks to use for reduction kernels.

This value has been empirically optimized through grid search across different GPU architectures. While this value is optimal for A100 GPUs, H100 GPUs may benefit from more blocks to fully saturate NVLink bandwidth.

Structs

  • Signal: A synchronization primitive for coordinating GPU thread blocks across multiple devices.

Functions

Was this page helpful?