IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

DistributedScatter

struct DistributedScatter

Distributed scatter: send different chunks to different device groups.

Each DP replica group receives a different input chunk from the root GPU. All TP devices within the same replica get the same chunk via P2P pull.

This op receives ngpus input tensors (one per GPU, padded from dp_size distinct chunks) plus ngpus signal buffers for synchronization. All GPUs see all chunks so they compute the same grid size (avoiding barrier deadlocks).

Implemented traitsโ€‹

AnyType, ImplicitlyDestructible

Methodsโ€‹

executeโ€‹

static def execute[dtype: DType, rank: Int, root: Int, target: StringSlice[StaticConstantOrigin], _trace_name: StringSlice[StaticConstantOrigin]](outputs: _FusedOutputVariadicTensors[static_specs=outputs.static_specs], inputs: VariadicTensors[Input, static_specs=inputs.static_specs], signal_buffers: VariadicTensors[MutableInput, static_specs=signal_buffers.static_specs], dev_ctxs_input: DeviceContextList)