Skip to main content

Mojo function

allreduce

allreduce[dtype: DType, rank: Int, ngpus: Int, output_lambda: OptionalReg[fn[dtype: DType, rank: Int, width: Int, *, alignment: Int](IndexList[rank], SIMD[dtype, width]) capturing -> None] = None](input: NDBuffer[dtype, rank, MutAnyOrigin], output: NDBuffer[dtype, rank, MutAnyOrigin], gpu_rank: Int, ctx: DeviceContext)

Per-GPU allreduce for use in multi-threaded contexts.

Currently requires prior single-threaded call to init_comms, as thread-safe version not yet implemented.

Was this page helpful?