Mojo function
matmul_allreduce
matmul_allreduce[ngpus: Int, partition_dim: Int, outputs_lambda: fn[Int, DType, Int, Int, Int](IndexList[$2], SIMD[$1, $3]) capturing -> None, a_dtype: DType, b_dtype: DType, out_dtype: DType, a_static_shape: DimList, b_static_shape: DimList, c_static_shape: DimList, out_static_shape: DimList, overlap_with_dpl: Bool = True](a_buffers: InlineArray[NDBuffer[a_dtype, 2, MutableAnyOrigin, a_static_shape], ngpus], b_buffers: InlineArray[NDBuffer[b_dtype, 2, MutableAnyOrigin, b_static_shape], ngpus], c_temp_buffers: InlineArray[NDBuffer[out_dtype, 2, MutableAnyOrigin, c_static_shape], ngpus], output_buffers: InlineArray[NDBuffer[out_dtype, 2, MutableAnyOrigin, out_static_shape], ngpus], rank_sigs: InlineArray[UnsafePointer[Signal], 8], ctxs: List[DeviceContext], num_partitions: ValOrDim[dim])
Performs C = matmul(A, B^T) followed with Out = allreduce(C) operation across multiple GPUs. Split the A or B and C matrices into num_partitions
submatrices at dimension partition_dim
. This way we can perform num_partitions
independent matmul + allreduce kernels, and overlap some of the computation.
matmul_allreduce[ngpus: Int, outputs_lambda: fn[Int, DType, Int, Int, Int](IndexList[$2], SIMD[$1, $3]) capturing -> None, a_dtype: DType, b_dtype: DType, out_dtype: DType, a_static_shape: DimList, b_static_shape: DimList, c_static_shape: DimList, out_static_shape: DimList](a_buffers: InlineArray[NDBuffer[a_dtype, 2, MutableAnyOrigin, a_static_shape], ngpus], b_buffers: InlineArray[NDBuffer[b_dtype, 2, MutableAnyOrigin, b_static_shape], ngpus], c_temp_buffers: InlineArray[NDBuffer[out_dtype, 2, MutableAnyOrigin, c_static_shape], ngpus], output_buffers: InlineArray[NDBuffer[out_dtype, 2, MutableAnyOrigin, out_static_shape], ngpus], rank_sigs: InlineArray[UnsafePointer[Signal], 8], ctxs: List[DeviceContext])
Performs C = matmul(A, B^T) followed with Out = allreduce(C) operation across multiple GPUs. The implementation might potentially split A / B / C matrices and overlap computation to speedup performance.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!