Mojo module

distributed_matmul

Functions

matmul_allreduce: Performs C = matmul(A, B^T) followed with Out = allreduce(C) operation across multiple GPUs. Split the A or B and C matrices into num_partitions submatrices at dimension partition_dim. This way we can perform num_partitions independent matmul + allreduce kernels, and overlap some of the computation.

View source

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!