For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
DistributedScatter
struct DistributedScatter
Distributed scatter: send different chunks to different device groups.
Each DP replica group receives a different input chunk from the root GPU. All TP devices within the same replica get the same chunk via P2P pull.
This op receives ngpus input tensors (one per GPU, padded from dp_size distinct chunks) plus ngpus signal buffers for synchronization. All GPUs see all chunks so they compute the same grid size (avoiding barrier deadlocks).
Implemented traitsโ
AnyType,
ImplicitlyDestructible
Methodsโ
executeโ
static def execute[dtype: DType, rank: Int, root: Int, target: StringSlice[StaticConstantOrigin], _trace_name: StringSlice[StaticConstantOrigin]](outputs: _FusedOutputVariadicTensors[static_specs=outputs.static_specs], inputs: VariadicTensors[Input, static_specs=inputs.static_specs], signal_buffers: VariadicTensors[MutableInput, static_specs=signal_buffers.static_specs], dev_ctxs_input: DeviceContextList)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!