Mojo function

broadcast

broadcast[dtype: DType, rank: Int, //, ngpus: Int, pdl_level: PDLLevel = PDLLevel(), use_multimem: Bool = False](input_buffer: NDBuffer[dtype, rank, ImmutAnyOrigin], output_buffer: NDBuffer[dtype, rank, MutAnyOrigin], rank_sigs: InlineArray[UnsafePointer[Signal, MutAnyOrigin], 8], ctx: DeviceContext, root: Int, _max_num_blocks: Optional[Int] = None)

Per-GPU broadcast for use in multi-threaded contexts.

Currently requires prior single-threaded call to init_comms, as thread-safe version not yet implemented.