Mojo function
broadcast
broadcast[dtype: DType, in_layout: TensorLayout, in_origin: Origin[mut=in_origin.mut], //, ngpus: Int, pdl_level: PDLLevel = PDLLevel(), use_multimem: Bool = False](input_tensor: TileTensor[dtype, in_layout, in_origin], output_tensor: TileTensor[dtype, in_layout], rank_sigs: InlineArray[UnsafePointer[Signal, MutAnyOrigin], 8], ctx: DeviceContext, root: Int, _max_num_blocks: Optional[Int] = None)
Broadcast data from root GPU to all participating GPUs.
Parameters:
- βdtype (
DType): Data type of the tensor elements. - βin_layout (
TensorLayout): Layout of the input TileTensor. - βin_origin (
Origin[mut=in_origin.mut]): Origin of the input TileTensor. - βngpus (
Int): Number of GPUs participating in the broadcast. - βpdl_level (
PDLLevel): Controls PDL behavior for P2P kernels. - βuse_multimem (
Bool): Whether to use multimem mode for improved performance.
Args:
- βinput_tensor (
TileTensor[dtype, in_layout, in_origin]): Input tensor from root GPU as a TileTensor. - βoutput_tensor (
TileTensor[dtype, in_layout]): Output tensor for THIS GPU as a TileTensor. - βrank_sigs (
InlineArray[UnsafePointer[Signal, MutAnyOrigin], 8]): Per-GPU Signal pointers. - βctx (
DeviceContext): Device context for THIS GPU. - βroot (
Int): Root GPU rank (source of broadcast data). - β_max_num_blocks (
Optional[Int]): Optional grid limit.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!