For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

DistributedScatter

struct DistributedScatter

Distributed scatter: send different chunks to different device groups.

Each DP replica group receives a different input chunk from the root GPU. All TP devices within the same replica get the same chunk via P2P pull.

This op receives ngpus input tensors (one per GPU, padded from dp_size distinct chunks) plus ngpus signal buffers for synchronization. All GPUs see all chunks so they compute the same grid size (avoiding barrier deadlocks).

Implemented traits

AnyType, ImplicitlyDeletable

Methods

`execute`

static def execute[dtype: DType, rank: Int, root: Int, target: StringSlice[ImmStaticOrigin], _trace_name: StringSlice[ImmStaticOrigin]](outputs: _FusedOutputVariadicTensors[static_specs=outputs.static_specs], inputs: VariadicTensors[IOSpec[_, _].Input, static_specs=inputs.static_specs], signal_buffers: VariadicTensors[IOSpec[_, _].MutableInput, static_specs=signal_buffers.static_specs], dev_ctxs_input: DeviceContextArray)

Implemented traits​

Methods​

execute​

Implemented traits

Methods

`execute`