Mojo module
reducescatter
Multi-GPU reducescatter implementation for distributed tensor reduction across GPUs.
comptime values
elementwise_epilogue_type
comptime elementwise_epilogue_type = fn[dtype: DType, rank: Int, width: Int, *, alignment: Int](IndexList[rank], SIMD[dtype, width]) capturing -> None
Structs
Functions
-
reducescatter: Per-device reducescatter operation.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!