For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

broadcast_multimem_kernel

def broadcast_multimem_kernel[dtype: DType, Layout: TensorLayout, BLOCK_SIZE: Int, ngpus: Int, simd_width: Int = simd_width_of[dtype, get_gpu_target()]()](output: TileTensor[dtype, Layout, MutAnyOrigin], input: TileTensor[dtype, Layout, ImmutAnyOrigin], rank_sigs: InlineArray[UnsafePointer[Signal, MutAnyOrigin], 8], my_rank: Int, root: Int)

Broadcast kernel using multimem.st for multicast writes.

Root GPU writes to multicast address, data appears on all GPUs. Only root performs the stores; other GPUs just participate in barriers.