Mojo function
allreduce_2stage_quickreduce_tile
allreduce_2stage_quickreduce_tile[dtype: DType, rank: Int, ngpus: Int, *, BLOCK_SIZE: Int, output_lambda: elementwise_epilogue_type, atom_size: Int, use_bufferio: Bool](result: NDBuffer[dtype, rank, MutAnyOrigin], local_src: LegacyUnsafePointer[Scalar[dtype], address_space=AddressSpace.GLOBAL if is_amd_gpu() else AddressSpace.GENERIC], rank_sigs: InlineArray[LegacyUnsafePointer[Signal], 8], num_elements: Int, my_rank: Int, tile: Int, num_tiles: Int, iteration: Int)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!