Skip to main content
Log in

Mojo function

lane_group_sum_and_broadcast

lane_group_sum_and_broadcast[val_type: DType, simd_width: Int, //, num_lanes: Int, stride: Int = 1](val: SIMD[val_type, simd_width]) -> SIMD[val_type, simd_width]

Computes the sum across a lane group and broadcasts the result to all lanes.

This function performs a parallel reduction using a butterfly pattern to compute the sum, then broadcasts the result to all participating lanes. The butterfly pattern ensures efficient communication between lanes through warp shuffle operations.

Parameters:

  • val_type (DType): The data type of the SIMD elements (e.g. float32, int32).
  • simd_width (Int): The number of elements in the SIMD vector.
  • num_lanes (Int): The number of threads participating in the reduction.
  • stride (Int): The stride between lanes participating in the reduction.

Args:

  • val (SIMD[val_type, simd_width]): The SIMD value to reduce. Each lane contributes its value to the sum.

Returns:

A SIMD value where all participating lanes contain the sum found across the lane group. Non-participating lanes (lane_id >= num_lanes) retain their original values.