Skip to main content
Log in

Mojo function

lane_group_reduce

lane_group_reduce[val_type: DType, simd_width: Int, //, shuffle: fn[DType, Int](val: SIMD[$0, $1], offset: SIMD[uint32, 1]) -> SIMD[$0, $1], func: fn[DType, Int](SIMD[$0, $1], SIMD[$0, $1]) capturing -> SIMD[$0, $1], num_lanes: Int, *, stride: Int = 1](val: SIMD[val_type, simd_width]) -> SIMD[val_type, simd_width]

Performs a generic warp-level reduction operation using shuffle operations.

This function implements a parallel reduction across threads in a warp using a butterfly pattern. It allows customizing both the shuffle operation and reduction function.

Example:

```mojo
from gpu.warp import lane_group_reduce, shuffle_down

# Compute sum across 16 threads using shuffle down
@parameter
fn add[type: DType, width: Int](x: SIMD[type, width], y: SIMD[type, width]) -> SIMD[type, width]:
return x + y
var val = SIMD[DType.float32, 16](42.0)
var result = lane_group_reduce[shuffle_down, add, num_lanes=16](val)
```
.
```mojo
from gpu.warp import lane_group_reduce, shuffle_down

# Compute sum across 16 threads using shuffle down
@parameter
fn add[type: DType, width: Int](x: SIMD[type, width], y: SIMD[type, width]) -> SIMD[type, width]:
return x + y
var val = SIMD[DType.float32, 16](42.0)
var result = lane_group_reduce[shuffle_down, add, num_lanes=16](val)
```
.

Parameters:

  • val_type (DType): The data type of the SIMD elements (e.g. float32, int32).
  • simd_width (Int): The number of elements in the SIMD vector.
  • shuffle (fn[DType, Int](val: SIMD[$0, $1], offset: SIMD[uint32, 1]) -> SIMD[$0, $1]): A function that performs the warp shuffle operation. Takes a SIMD value and offset and returns the shuffled result.
  • func (fn[DType, Int](SIMD[$0, $1], SIMD[$0, $1]) capturing -> SIMD[$0, $1]): A binary function that combines two SIMD values during reduction. This defines the reduction operation (e.g. add, max, min).
  • num_lanes (Int): The number of lanes in a group. The reduction is done within each group. Must be a power of 2.
  • stride (Int): The stride between lanes participating in the reduction.

Args:

  • val (SIMD[val_type, simd_width]): The SIMD value to reduce. Each lane contributes its value.

Returns:

A SIMD value containing the reduction result.