Mojo function
broadcast
broadcast[dtype: DType, in_layout: TensorLayout, in_origin: Origin[mut=in_origin.mut], //, ngpus: Int, pdl_level: PDLLevel = PDLLevel(), use_multimem: Bool = False](input_tensor: TileTensor[dtype, in_layout, in_origin], output_tensor: TileTensor[dtype, output_tensor.LayoutType, output_tensor.origin, address_space=output_tensor.address_space, linear_idx_type=output_tensor.linear_idx_type, element_size=output_tensor.element_size], rank_sigs: InlineArray[UnsafePointer[Signal, MutAnyOrigin], 8], ctx: DeviceContext, root: Int, _max_num_blocks: Optional[Int] = None)
Per-GPU broadcast for use in multi-threaded contexts.
Currently requires prior single-threaded call to init_comms, as thread-safe version not yet implemented.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!