# reduction

Implements SIMD reductions.

You can import these APIs from the algorithm package. For example:

from algorithm import map_reduce

## map_reduce​

map_reduce[simd_width: Int, size: Dim, type: DType, acc_type: DType, input_gen_fn: fn[DType, Int](Int, /) capturing -> SIMD[$0,$1], reduce_vec_to_vec_fn: fn[DType, DType, Int](SIMD[$0,$2], SIMD[$1,$2], /) capturing -> SIMD[$0,$2], reduce_vec_to_scalar_fn: fn[DType, Int](SIMD[$0,$1], /) -> SIMD[$0, 1]](dst: Buffer[type, size, 0], init: SIMD[acc_type, 1]) -> SIMD[acc_type, 1] Stores the result of calling input_gen_fn in dst and simultaneously reduce the result using a custom reduction function. Parameters: • simd_width (Int): The vector width for the computation. • size (Dim): The buffer size. • type (DType): The buffer elements dtype. • acc_type (DType): The dtype of the reduction accumulator. • input_gen_fn (fn[DType, Int](Int, /) capturing -> SIMD[$0, $1]): A function that generates inputs to reduce. • reduce_vec_to_vec_fn (fn[DType, DType, Int](SIMD[$0, $2], SIMD[$1, $2], /) capturing -> SIMD[$0, $2]): A mapping function. This function is used to combine (accumulate) two chunks of input data: e.g. we load two 8xfloat32 vectors of elements and need to reduce them into a single 8xfloat32 vector. • reduce_vec_to_scalar_fn (fn[DType, Int](SIMD[$0, $1], /) -> SIMD[$0, 1]): A reduction function. This function is used to reduce a vector to a scalar. E.g. when we got 8xfloat32 vector and want to reduce it to an float32 scalar.

Args:

• dst (Buffer[type, size, 0]): The output buffer.
• init (SIMD[acc_type, 1]): The initial value to use in accumulator.

Returns:

The computed reduction value.

## reduce​

reduce[reduce_fn: fn[DType, DType, Int](SIMD[$0,$2], SIMD[$1,$2], /) capturing -> SIMD[$0,$2], src: DType, src: Dim, src: AddressSpace, init: DType](src: Buffer[src, src, src], init: SIMD[init, 1]) -> SIMD[init, 1]

Computes a custom reduction of buffer elements.

Parameters:

• reduce_fn (fn[DType, DType, Int](SIMD[$0,$2], SIMD[$1,$2], /) capturing -> SIMD[$0,$2]): The lambda implementing the reduction.

Args:

• src (Buffer[src, src, src]): The input buffer.
• init (SIMD[init, 1]): The initial value to use in accumulator.

Returns:

The computed reduction value.

reduce[map_fn: fn[DType, DType, Int](SIMD[$0,$2], SIMD[$1,$2], /) capturing -> SIMD[$0,$2], reduce_fn: fn[DType, Int](SIMD[$0,$1], /) -> SIMD[$0, 1], reduce_axis: Int, src: DType, src: Int, src: DimList, src: AddressSpace, dst: DType, dst: Int, dst: DimList, dst: AddressSpace](src: NDBuffer[src, src, src, src], dst: NDBuffer[dst, dst, dst, dst], init: SIMD[dst, 1]) Performs a reduction across reduce_axis of an NDBuffer (src) and stores the result in an NDBuffer (dst). First src is reshaped into a 3D tensor. Without loss of generality, the three axes will be referred to as [H,W,C], where the axis to reduce across is W, the axes before the reduce axis are packed into H, and the axes after the reduce axis are packed into C. i.e. a tensor with dims [D1, D2, ..., Di, ..., Dn] reducing across axis i gets packed into a 3D tensor with dims [H, W, C], where H=prod(D1,...,Di-1), W = Di, and C = prod(Di+1,...,Dn). Parameters: • map_fn (fn[DType, DType, Int](SIMD[$0, $2], SIMD[$1, $2], /) capturing -> SIMD[$0, $2]): A mapping function. This function is used when to combine (accumulate) two chunks of input data: e.g. we load two 8xfloat32 vectors of elements and need to reduce them to a single 8xfloat32 vector. • reduce_fn (fn[DType, Int](SIMD[$0, $1], /) -> SIMD[$0, 1]): A reduction function. This function is used to reduce a vector to a scalar. E.g. when we got 8xfloat32 vector and want to reduce it to 1xfloat32.
• reduce_axis (Int): The axis to reduce across.

Args:

• src (NDBuffer[src, src, src, src]): The input buffer.
• dst (NDBuffer[dst, dst, dst, dst]): The output buffer.
• init (SIMD[dst, 1]): The initial value to use in accumulator.

## reduce_boolean​

reduce_boolean[reduce_fn: fn[DType, Int](SIMD[$0,$1], /) capturing -> Bool, continue_fn: fn(Bool, /) capturing -> Bool, src: DType, src: Dim, src: AddressSpace](src: Buffer[src, src, src], init: Bool) -> Bool

Computes a bool reduction of buffer elements. The reduction will early exit if the continue_fn returns False.

Parameters:

• reduce_fn (fn[DType, Int](SIMD[$0,$1], /) capturing -> Bool): A boolean reduction function. This function is used to reduce a vector to a scalar. E.g. when we got 8xfloat32 vector and want to reduce it to a bool.
• continue_fn (fn(Bool, /) capturing -> Bool): A function to indicate whether we want to continue processing the rest of the iterations. This takes the result of the reduce_fn and returns True to continue processing and False to early exit.

Args:

• src (Buffer[src, src, src]): The input buffer.
• init (Bool): The initial value to use.

Returns:

The computed reduction value.

## max​

max[src: DType, src: Dim, src: AddressSpace](src: Buffer[src, src, src]) -> SIMD[src, 1]

Computes the max element in a buffer.

Args:

• src (Buffer[src, src, src]): The buffer.

Returns:

The maximum of the buffer elements.

max[reduce_axis: Int, src: DType, src: Int, src: DimList, src: AddressSpace, dst: DimList](src: NDBuffer[src, src, src, src], dst: NDBuffer[src, src, dst, 0])

Computes the max across reduce_axis of an NDBuffer.

Parameters:

• reduce_axis (Int): The axis to reduce across.

Args:

• src (NDBuffer[src, src, src, src]): The input buffer.
• dst (NDBuffer[src, src, dst, 0]): The output buffer.

## min​

min[src: DType, src: Dim, src: AddressSpace](src: Buffer[src, src, src]) -> SIMD[src, 1]

Computes the min element in a buffer.

Args:

• src (Buffer[src, src, src]): The buffer.

Returns:

The minimum of the buffer elements.

min[reduce_axis: Int, src: DType, src: Int, src: DimList, src: AddressSpace, dst: DimList](src: NDBuffer[src, src, src, src], dst: NDBuffer[src, src, dst, 0])

Computes the min across reduce_axis of an NDBuffer.

Parameters:

• reduce_axis (Int): The axis to reduce across.

Args:

• src (NDBuffer[src, src, src, src]): The input buffer.
• dst (NDBuffer[src, src, dst, 0]): The output buffer.

## sum​

sum[src: DType, src: Dim, src: AddressSpace](src: Buffer[src, src, src]) -> SIMD[src, 1]

Computes the sum of buffer elements.

Args:

• src (Buffer[src, src, src]): The buffer.

Returns:

The sum of the buffer elements.

sum[reduce_axis: Int, src: DType, src: Int, src: DimList, src: AddressSpace, dst: DimList](src: NDBuffer[src, src, src, src], dst: NDBuffer[src, src, dst, 0])

Computes the sum across reduce_axis of an NDBuffer.

Parameters:

• reduce_axis (Int): The axis to reduce across.

Args:

• src (NDBuffer[src, src, src, src]): The input buffer.
• dst (NDBuffer[src, src, dst, 0]): The output buffer.

## product​

product[src: DType, src: Dim, src: AddressSpace](src: Buffer[src, src, src]) -> SIMD[src, 1]

Computes the product of the buffer elements.

Args:

• src (Buffer[src, src, src]): The buffer.

Returns:

The product of the buffer elements.

product[reduce_axis: Int, src: DType, src: Int, src: DimList, src: AddressSpace, dst: DimList](src: NDBuffer[src, src, src, src], dst: NDBuffer[src, src, dst, 0])

Computes the product across reduce_axis of an NDBuffer.

Parameters:

• reduce_axis (Int): The axis to reduce across.

Args:

• src (NDBuffer[src, src, src, src]): The input buffer.
• dst (NDBuffer[src, src, dst, 0]): The output buffer.

## mean​

mean[src: DType, src: Dim, src: AddressSpace](src: Buffer[src, src, src]) -> SIMD[src, 1]

Computes the mean value of the elements in a buffer.

Args:

• src (Buffer[src, src, src]): The buffer of elements for which the mean is computed.

Returns:

The mean value of the elements in the given buffer.

mean[reduce_axis: Int, src: DType, src: Int, src: DimList, src: AddressSpace, dst: DimList](src: NDBuffer[src, src, src, src], dst: NDBuffer[src, src, dst, 0])

Computes the mean across reduce_axis of an NDBuffer.

Parameters:

• reduce_axis (Int): The axis to reduce across.

Args:

• src (NDBuffer[src, src, src, src]): The input buffer.
• dst (NDBuffer[src, src, dst, 0]): The output buffer.

mean[type: DType, input_fn: fn[Int, Int](StaticIntTuple[$1], /) capturing -> SIMD[type,$0], output_fn: fn[Int, Int](StaticIntTuple[$1], SIMD[type,$0], /) capturing -> None, single_thread_blocking_override: Bool, target: StringLiteral, input_shape: Int](input_shape: StaticIntTuple[input_shape], reduce_dim: Int, output_shape: StaticIntTuple[input_shape])

Computes the mean across the input and output shape.

This performs the mean computation on the domain specified by input_shape, storing the results using theinput_0_fn. The results' domain is output_shape which are stored using the output_0_fn.

Parameters:

• type (DType): The type of the input and output.
• input_fn (fn[Int, Int](StaticIntTuple[$1], /) capturing -> SIMD[type,$0]): The function to load the input.
• output_fn (fn[Int, Int](StaticIntTuple[$1], SIMD[type,$0], /) capturing -> None): The function to store the output.
• single_thread_blocking_override (Bool): If True, then the operation is run synchronously using a single thread.
• target (StringLiteral): The target to run on.

Args:

• input_shape (StaticIntTuple[input_shape]): The input shape.
• reduce_dim (Int): The axis to perform the mean on.
• output_shape (StaticIntTuple[input_shape]): The output shape.

## variance​

variance[src: DType, src: Dim, src: AddressSpace](src: Buffer[src, src, src], mean_value: SIMD[src, 1], correction: Int) -> SIMD[src, 1]

Given a mean, computes the variance of elements in a buffer.

The mean value is used to avoid a second pass over the data:

variance(x) = sum((x - E(x))^2) / (size - correction)

Args:

• src (Buffer[src, src, src]): The buffer.
• mean_value (SIMD[src, 1]): The mean value of the buffer.
• correction (Int): Normalize variance by size - correction.

Returns:

The variance value of the elements in a buffer.

variance[src: DType, src: Dim, src: AddressSpace](src: Buffer[src, src, src], correction: Int) -> SIMD[src, 1]

Computes the variance value of the elements in a buffer.

variance(x) = sum((x - E(x))^2) / (size - correction)

Args:

• src (Buffer[src, src, src]): The buffer.
• correction (Int): Normalize variance by size - correction (Default=1).

Returns:

The variance value of the elements in a buffer.

## all_true​

all_true[src: DType, src: Dim, src: AddressSpace](src: Buffer[src, src, src]) -> Bool

Returns True if all the elements in a buffer are True and False otherwise.

Args:

• src (Buffer[src, src, src]): The buffer.

Returns:

True if all of the elements of the buffer are True and False otherwise.

## any_true​

any_true[src: DType, src: Dim, src: AddressSpace](src: Buffer[src, src, src]) -> Bool

Returns True if any the elements in a buffer are True and False otherwise.

Args:

• src (Buffer[src, src, src]): The buffer.

Returns:

True if any of the elements of the buffer are True and False otherwise.

## none_true​

none_true[src: DType, src: Dim, src: AddressSpace](src: Buffer[src, src, src]) -> Bool

Returns True if none of the elements in a buffer are True and False otherwise.

Args:

• src (Buffer[src, src, src]): The buffer.

Returns:

True if none of the elements of the buffer are True and False otherwise.

## argmax​

argmax[input: DType, input: Int, input: DimList, input: AddressSpace, output: DType, output: Int, output: DimList, output: AddressSpace](input: NDBuffer[input, input, input, input], axis: Int, output: NDBuffer[output, output, output, output])

Finds the indices of the maximum element along the specified axis.

Args:

• input (NDBuffer[input, input, input, input]): The input tensor.
• axis (Int): The axis.
• output (NDBuffer[output, output, output, output]): The output tensor.

argmax[input: DType, input: Int, input: DimList, input: AddressSpace, axis_buf: DType, axis_buf: Int, axis_buf: DimList, axis_buf: AddressSpace, output: DType, output: Int, output: DimList, output: AddressSpace](input: NDBuffer[input, input, input, input], axis_buf: NDBuffer[axis_buf, axis_buf, axis_buf, axis_buf], output: NDBuffer[output, output, output, output])

Finds the indices of the maximum element along the specified axis.

Args:

• input (NDBuffer[input, input, input, input]): The input tensor.
• axis_buf (NDBuffer[axis_buf, axis_buf, axis_buf, axis_buf]): The axis tensor.
• output (NDBuffer[output, output, output, output]): The axis tensor.

## argmin​

argmin[input: DType, input: Int, input: DimList, input: AddressSpace, output: DType, output: Int, output: DimList, output: AddressSpace](input: NDBuffer[input, input, input, input], axis: Int, output: NDBuffer[output, output, output, output])

Finds the indices of the maximum element along the specified axis.

Args:

• input (NDBuffer[input, input, input, input]): The input tensor.
• axis (Int): The axis.
• output (NDBuffer[output, output, output, output]): The output tensor.

argmin[input: DType, input: Int, input: DimList, input: AddressSpace, axis_buf: DType, axis_buf: Int, axis_buf: DimList, axis_buf: AddressSpace, output: DType, output: Int, output: DimList, output: AddressSpace](input: NDBuffer[input, input, input, input], axis_buf: NDBuffer[axis_buf, axis_buf, axis_buf, axis_buf], output: NDBuffer[output, output, output, output])

Finds the indices of the minimum element along the specified axis.

Args:

• input (NDBuffer[input, input, input, input]): The input tensor.
• axis_buf (NDBuffer[axis_buf, axis_buf, axis_buf, axis_buf]): The axis tensor.
• output (NDBuffer[output, output, output, output]): The axis tensor.

## reduce_shape​

reduce_shape[input_rank: Int, input_type: DType, axis_type: DType, single_thread_blocking_override: Bool](input_buf: NDBuffer[input_type, input_rank, create_unknown[stdlib::builtin::int::Int][input_rank](), 0], axis_buf: NDBuffer[axis_type, 1, create_unknown[stdlib::builtin::int::Int][1](), 0]) -> StaticIntTuple[input_rank]

Compute the output shape of a pad operation, and assert the inputs are compatible.

Parameters:

• input_rank (Int): Input_rank of the input tensor.
• input_type (DType): Type of the input tensor.
• axis_type (DType): Type of the axis tensor.
• single_thread_blocking_override (Bool): If True, then the operation is run synchronously using a single thread.

Args:

• input_buf (NDBuffer[input_type, input_rank, create_unknown[stdlib::builtin::int::Int][input_rank](), 0]): The input tensor.
• axis_buf (NDBuffer[axis_type, 1, create_unknown[stdlib::builtin::int::Int][1](), 0]): The axis tensor.

Returns:

The output shape.

## cumsum​

cumsum[dst: DType, dst: Dim, dst: AddressSpace](dst: Buffer[dst, dst, dst], src: Buffer[dst, dst, dst])

Computes the cumulative sum of all elements in a buffer. dst[i] = src[i] + src[i-1] + ... + src[0].

Args:

• dst (Buffer[dst, dst, dst]): The buffer that stores the result of cumulative sum operation.
• src (Buffer[dst, dst, dst]): The buffer of elements for which the cumulative sum is computed.