Mojo function

lane_group_max_and_broadcast

lane_group_max_and_broadcast[val_type: DType, simd_width: Int, //, num_lanes: Int, stride: Int = 1](val: SIMD[val_type, simd_width]) -> SIMD[val_type, simd_width]

Reduces and broadcasts the maximum value within a lane group using warp-level operations.

This function performs a parallel reduction to find the maximum value and broadcasts it to all lanes. The reduction and broadcast are done using warp shuffle operations in a butterfly pattern for efficient all-to-all communication between lanes.

Parameters:

val_type (DType): The data type of the SIMD elements (e.g. float32, int32).
simd_width (Int): The number of elements in the SIMD vector.
num_lanes (Int): The number of threads participating in the reduction.
stride (Int): The stride between lanes participating in the reduction.

Args:

val (SIMD): The SIMD value to reduce and broadcast. Each lane contributes its value.

Returns:

SIMD: A SIMD value where all participating lanes contain the maximum value found across the lane group. Non-participating lanes (lane_id >= num_lanes) retain their original values.