Mojo struct
ReduceScatterConfig
struct ReduceScatterConfig[dtype: DType, ngpus: Int, simd_width: Int = simd_width_of[dtype, get_gpu_target()](), alignment: Int = align_of[SIMD[dtype, simd_width]](), accum_type: DType = get_accum_type[dtype]()]
Configuration for axis-aware reduce-scatter partitioning.
Divides axis_size units evenly across GPUs. Lower ranks get one extra
unit when there's a remainder. The 1D case is a special case where
axis_size = num_elements // simd_width and unit_numel = simd_width.
Fieldsβ
- βstride (
Int): - βaxis_part (
Int): - βaxis_remainder (
Int): - βunit_numel (
Int):
Implemented traitsβ
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
Methodsβ
__init__β
__init__(axis_size: Int, unit_numel: Int, threads_per_gpu: Int) -> Self
General constructor for axis-aware partitioning.
Args:
__init__(num_elements: Int, threads_per_gpu: Int) -> Self
1D convenience constructor. Partitions by SIMD vectors.
rank_unit_startβ
rank_unit_start(self, rank: Int) -> Int
Start unit index along scatter axis for this rank.
Returns:
rank_unitsβ
rank_num_elementsβ
rank_startβ
rank_endβ
rank_partβ
rank_part(self, rank: Int) -> Int
Number of elements for this rank (alias for rank_num_elements).
Returns:
thr_local_startβ
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!