Mojo function

reduce

reduce[: origin.set, //, reduce_fn: fn[DType, DType, Int](SIMD[$0, $2], SIMD[$1, $2]) capturing -> SIMD[$0, $2]](src: Buffer[type, size, address_space=address_space, origin=origin], init: SIMD[type, 1]) -> SIMD[type, 1]

Computes a custom reduction of buffer elements.

Parameters:

reduce_fn (fn[DType, DType, Int](SIMD[$0, $2], SIMD[$1, $2]) capturing -> SIMD[$0, $2]): The lambda implementing the reduction.

Args:

src (Buffer[type, size, address_space=address_space, origin=origin]): The input buffer.
init (SIMD[type, 1]): The initial value to use in accumulator.

Returns:

The computed reduction value.

reduce[: origin.set, //, map_fn: fn[DType, DType, Int](SIMD[$0, $2], SIMD[$1, $2]) capturing -> SIMD[$0, $2], reduce_fn: fn[DType, Int](SIMD[$0, $1]) -> SIMD[$0, 1], reduce_axis: Int](src: NDBuffer[type, rank, shape, strides, alignment=alignment, address_space=address_space, exclusive=exclusive], dst: NDBuffer[type, rank, shape, strides, alignment=alignment, address_space=address_space, exclusive=exclusive], init: SIMD[type, 1])

Performs a reduction across reduce_axis of an NDBuffer (src) and stores the result in an NDBuffer (dst).

First src is reshaped into a 3D tensor. Without loss of generality, the three axes will be referred to as [H,W,C], where the axis to reduce across is W, the axes before the reduce axis are packed into H, and the axes after the reduce axis are packed into C. i.e. a tensor with dims [D1, D2, ..., Di, ..., Dn] reducing across axis i gets packed into a 3D tensor with dims [H, W, C], where H=prod(D1,...,Di-1), W = Di, and C = prod(Di+1,...,Dn).

Parameters:

map_fn (fn[DType, DType, Int](SIMD[$0, $2], SIMD[$1, $2]) capturing -> SIMD[$0, $2]): A mapping function. This function is used when to combine (accumulate) two chunks of input data: e.g. we load two 8xfloat32 vectors of elements and need to reduce them to a single 8xfloat32 vector.
reduce_fn (fn[DType, Int](SIMD[$0, $1]) -> SIMD[$0, 1]): A reduction function. This function is used to reduce a vector to a scalar. E.g. when we got 8xfloat32 vector and want to reduce it to 1xfloat32.
reduce_axis (Int): The axis to reduce across.

Args:

src (NDBuffer[type, rank, shape, strides, alignment=alignment, address_space=address_space, exclusive=exclusive]): The input buffer.
dst (NDBuffer[type, rank, shape, strides, alignment=alignment, address_space=address_space, exclusive=exclusive]): The output buffer.
init (SIMD[type, 1]): The initial value to use in accumulator.