Module

Implements SIMD reductions.

You can import these APIs from the `algorithm` package. For example:

``from algorithm import map_reduce``

## `map_reduce`

`map_reduce[simd_width: Int, size: Dim, type: DType, acc_type: DType, input_gen_fn: fn[DType, Int](Int) capturing -> SIMD[*(0,0), *(0,1)], reduce_vec_to_vec_fn: fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)], reduce_vec_to_scalar_fn: fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1]](dst: Buffer[size, type], init: SIMD[acc_type, 1]) -> SIMD[acc_type, 1]`

Stores the result of calling input_gen_fn in dst and simultaneously reduce the result using a custom reduction function.

Parameters:

• simd_width (`Int`): The vector width for the computation.
• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.
• acc_type (`DType`): The dtype of the reduction accumulator.
• input_gen_fn (`fn[DType, Int](Int) capturing -> SIMD[*(0,0), *(0,1)]`): A function that generates inputs to reduce.
• reduce_vec_to_vec_fn (`fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)]`): A mapping function. This function is used to combine (accumulate) two chunks of input data: e.g. we load two `8xfloat32` vectors of elements and need to reduce them into a single `8xfloat32` vector.
• reduce_vec_to_scalar_fn (`fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1]`): A reduction function. This function is used to reduce a vector to a scalar. E.g. when we got `8xfloat32` vector and want to reduce it to an `float32` scalar.

Args:

• dst (`Buffer[size, type]`): The output buffer.
• init (`SIMD[acc_type, 1]`): The initial value to use in accumulator.

Returns:

The computed reduction value.

## `reduce`

`reduce[simd_width: Int, size: Dim, type: DType, acc_type: DType, map_fn: fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)], reduce_fn: fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1]](src: Buffer[size, type], init: SIMD[acc_type, 1]) -> SIMD[acc_type, 1]`

Computes a custom reduction of buffer elements.

Parameters:

• simd_width (`Int`): The vector width for the computation.
• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.
• acc_type (`DType`): The dtype of the reduction accumulator.
• map_fn (`fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)]`): A mapping function. This function is used when to combine (accumulate) two chunks of input data: e.g. we load two 8xfloat32 vectors of elements and need to reduce them to a single 8xfloat32 vector.
• reduce_fn (`fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1]`): A reduction function. This function is used to reduce a vector to a scalar. E.g. when we got 8xfloat32 vector and want to reduce it to 1xfloat32.

Args:

• src (`Buffer[size, type]`): The input buffer.
• init (`SIMD[acc_type, 1]`): The initial value to use in accumulator.

Returns:

The computed reduction value.

`reduce[simd_width: Int, rank: Int, input_shape: DimList, output_shape: DimList, type: DType, acc_type: DType, map_fn: fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)], reduce_fn: fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1], reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, acc_type], init: SIMD[acc_type, 1])`

Performs a reduction across reduce_axis of an NDBuffer (src) and stores the result in an NDBuffer (dst).

First src is reshaped into a 3D tensor. Without loss of generality, the three axes will be referred to as [H,W,C], where the axis to reduce across is W, the axes before the reduce axis are packed into H, and the axes after the reduce axis are packed into C. i.e. a tensor with dims [D1, D2, â€¦, Di, â€¦, Dn] reducing across axis i gets packed into a 3D tensor with dims [H, W, C], where H=prod(D1,â€¦,Di-1), W = Di, and C = prod(Di+1,â€¦,Dn).

Parameters:

• simd_width (`Int`): The vector width for the computation.
• rank (`Int`): The rank of the input/output buffers.
• input_shape (`DimList`): The input buffer shape.
• output_shape (`DimList`): The output buffer shape.
• type (`DType`): The buffer elements dtype.
• acc_type (`DType`): The dtype of the reduction accumulator.
• map_fn (`fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)]`): A mapping function. This function is used when to combine (accumulate) two chunks of input data: e.g. we load two 8xfloat32 vectors of elements and need to reduce them to a single 8xfloat32 vector.
• reduce_fn (`fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1]`): A reduction function. This function is used to reduce a vector to a scalar. E.g. when we got 8xfloat32 vector and want to reduce it to 1xfloat32.
• reduce_axis (`Int`): The axis to reduce across.

Args:

• src (`NDBuffer[rank, input_shape, type]`): The input buffer.
• dst (`NDBuffer[rank, output_shape, acc_type]`): The output buffer.
• init (`SIMD[acc_type, 1]`): The initial value to use in accumulator.

## `reduce_boolean`

`reduce_boolean[simd_width: Int, size: Dim, type: DType, reduce_fn: fn[DType, Int](SIMD[*(0,0), *(0,1)]) capturing -> Bool, continue_fn: fn(Bool) capturing -> Bool](src: Buffer[size, type], init: Bool) -> Bool`

Computes a bool reduction of buffer elements. The reduction will early exit if the `continue_fn` returns False.

Parameters:

• simd_width (`Int`): The vector width for the computation.
• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.
• reduce_fn (`fn[DType, Int](SIMD[*(0,0), *(0,1)]) capturing -> Bool`): A boolean reduction function. This function is used to reduce a vector to a scalar. E.g. when we got `8xfloat32` vector and want to reduce it to a `bool`.
• continue_fn (`fn(Bool) capturing -> Bool`): A function to indicate whether we want to continue processing the rest of the iterations. This takes the result of the reduce_fn and returns True to continue processing and False to early exit.

Args:

• src (`Buffer[size, type]`): The input buffer.
• init (`Bool`): The initial value to use.

Returns:

The computed reduction value.

## `max`

`max[size: Dim, type: DType](src: Buffer[size, type]) -> SIMD[type, 1]`

Computes the max element in a buffer.

Parameters:

• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.

Args:

• src (`Buffer[size, type]`): The buffer.

Returns:

The maximum of the buffer elements.

`max[rank: Int, input_shape: DimList, output_shape: DimList, type: DType, reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, type])`

Computes the max across reduce_axis of an NDBuffer.

Parameters:

• rank (`Int`): The rank of the input/output buffers.
• input_shape (`DimList`): The input buffer shape.
• output_shape (`DimList`): The output buffer shape.
• type (`DType`): The buffer elements dtype.
• reduce_axis (`Int`): The axis to reduce across.

Args:

• src (`NDBuffer[rank, input_shape, type]`): The input buffer.
• dst (`NDBuffer[rank, output_shape, type]`): The output buffer.

## `min`

`min[size: Dim, type: DType](src: Buffer[size, type]) -> SIMD[type, 1]`

Computes the min element in a buffer.

Parameters:

• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.

Args:

• src (`Buffer[size, type]`): The buffer.

Returns:

The minimum of the buffer elements.

`min[rank: Int, input_shape: DimList, output_shape: DimList, type: DType, reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, type])`

Computes the min across reduce_axis of an NDBuffer.

Parameters:

• rank (`Int`): The rank of the input/output buffers.
• input_shape (`DimList`): The input buffer shape.
• output_shape (`DimList`): The output buffer shape.
• type (`DType`): The buffer elements dtype.
• reduce_axis (`Int`): The axis to reduce across.

Args:

• src (`NDBuffer[rank, input_shape, type]`): The input buffer.
• dst (`NDBuffer[rank, output_shape, type]`): The output buffer.

## `sum`

`sum[size: Dim, type: DType](src: Buffer[size, type]) -> SIMD[type, 1]`

Computes the sum of buffer elements.

Parameters:

• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.

Args:

• src (`Buffer[size, type]`): The buffer.

Returns:

The sum of the buffer elements.

`sum[rank: Int, input_shape: DimList, output_shape: DimList, type: DType, reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, type])`

Computes the sum across reduce_axis of an NDBuffer.

Parameters:

• rank (`Int`): The rank of the input/output buffers.
• input_shape (`DimList`): The input buffer shape.
• output_shape (`DimList`): The output buffer shape.
• type (`DType`): The buffer elements dtype.
• reduce_axis (`Int`): The axis to reduce across.

Args:

• src (`NDBuffer[rank, input_shape, type]`): The input buffer.
• dst (`NDBuffer[rank, output_shape, type]`): The output buffer.

## `product`

`product[size: Dim, type: DType](src: Buffer[size, type]) -> SIMD[type, 1]`

Computes the product of the buffer elements.

Parameters:

• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.

Args:

• src (`Buffer[size, type]`): The buffer.

Returns:

The product of the buffer elements.

`product[rank: Int, input_shape: DimList, output_shape: DimList, type: DType, reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, type])`

Computes the product across reduce_axis of an NDBuffer.

Parameters:

• rank (`Int`): The rank of the input/output buffers.
• input_shape (`DimList`): The input buffer shape.
• output_shape (`DimList`): The output buffer shape.
• type (`DType`): The buffer elements dtype.
• reduce_axis (`Int`): The axis to reduce across.

Args:

• src (`NDBuffer[rank, input_shape, type]`): The input buffer.
• dst (`NDBuffer[rank, output_shape, type]`): The output buffer.

## `mean`

`mean[size: Dim, type: DType](src: Buffer[size, type]) -> SIMD[type, 1]`

Computes the mean value of the elements in a buffer.

Parameters:

• size (`Dim`): The size of the input buffer..
• type (`DType`): The type of the elements of the input buffer and output SIMD vector.

Args:

• src (`Buffer[size, type]`): The buffer of elements for which the mean is computed.

Returns:

The mean value of the elements in the given buffer.

`mean[rank: Int, input_shape: DimList, output_shape: DimList, type: DType, reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, type])`

Computes the mean across reduce_axis of an NDBuffer.

Parameters:

• rank (`Int`): The rank of the input/output buffers.
• input_shape (`DimList`): The input buffer shape.
• output_shape (`DimList`): The output buffer shape.
• type (`DType`): The buffer elements dtype.
• reduce_axis (`Int`): The axis to reduce across.

Args:

• src (`NDBuffer[rank, input_shape, type]`): The input buffer.
• dst (`NDBuffer[rank, output_shape, type]`): The output buffer.

## `variance`

`variance[size: Dim, type: DType](src: Buffer[size, type], mean_value: SIMD[type, 1], correction: Int) -> SIMD[type, 1]`

Given a mean, computes the variance of elements in a buffer.

The mean value is used to avoid a second pass over the data:

``variance = sum((x - E(x))^2) / (size - correction)``

Parameters:

• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.

Args:

• src (`Buffer[size, type]`): The buffer.
• mean_value (`SIMD[type, 1]`): The mean value of the buffer.
• correction (`Int`): Normalize variance by size - correction.

Returns:

The variance value of the elements in a buffer.

`variance[size: Dim, type: DType](src: Buffer[size, type], correction: Int) -> SIMD[type, 1]`

Computes the variance value of the elements in a buffer.

``variance(src) = sum((x - E(x))^2) / (size - correction)``

Parameters:

• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.

Args:

• src (`Buffer[size, type]`): The buffer.
• correction (`Int`): Normalize variance by size - correction (Default=1).

Returns:

The variance value of the elements in a buffer.

## `all_true`

`all_true[size: Dim, type: DType](src: Buffer[size, type]) -> Bool`

Returns True if all the elements in a buffer are True and False otherwise.

Parameters:

• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.

Args:

• src (`Buffer[size, type]`): The buffer.

Returns:

True if all of the elements of the buffer are True and False otherwise.

## `any_true`

`any_true[size: Dim, type: DType](src: Buffer[size, type]) -> Bool`

Returns True if any the elements in a buffer are True and False otherwise.

Parameters:

• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.

Args:

• src (`Buffer[size, type]`): The buffer.

Returns:

True if any of the elements of the buffer are True and False otherwise.

## `none_true`

`none_true[size: Dim, type: DType](src: Buffer[size, type]) -> Bool`

Returns True if none of the elements in a buffer are True and False otherwise.

Parameters:

• size (`Dim`): The buffer size.
• type (`DType`): The buffer elements dtype.

Args:

• src (`Buffer[size, type]`): The buffer.

Returns:

True if none of the elements of the buffer are True and False otherwise.

## `argmax`

`argmax[type: DType, out_type: DType, rank: Int](input: NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), type], axis: Int, output: NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), out_type], out_chain: OutputChainPtr)`

Finds the indices of the maximum element along the specified axis.

Parameters:

• type (`DType`): Type of the input tensor.
• out_type (`DType`): Type of the output tensor.
• rank (`Int`): The rank of the input / output.

Args:

• input (`NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), type]`): The input tensor.
• axis (`Int`): The axis.
• output (`NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), out_type]`): The output tensor.
• out_chain (`OutputChainPtr`): The chain to attach results to.

`argmax[type: DType, out_type: DType, axis_type: DType, rank: Int](input: NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), type], axis_buf: NDBuffer[1, create_unknown[\$builtin::\$int::Int][1](), axis_type], output: NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), out_type], out_chain: OutputChainPtr)`

Finds the indices of the maximum element along the specified axis.

Parameters:

• type (`DType`): Type of the input tensor.
• out_type (`DType`): Type of the output tensor.
• axis_type (`DType`): Type of the axis tensor.
• rank (`Int`): The rank of the input / output.

Args:

• input (`NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), type]`): The input tensor.
• axis_buf (`NDBuffer[1, create_unknown[\$builtin::\$int::Int][1](), axis_type]`): The axis tensor.
• output (`NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), out_type]`): The axis tensor.
• out_chain (`OutputChainPtr`): The chain to attach results to.

## `argmin`

`argmin[type: DType, out_type: DType, rank: Int](input: NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), type], axis: Int, output: NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), out_type], out_chain: OutputChainPtr)`

Finds the indices of the maximum element along the specified axis.

Parameters:

• type (`DType`): Type of the input tensor.
• out_type (`DType`): Type of the output tensor.
• rank (`Int`): The rank of the input / output.

Args:

• input (`NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), type]`): The input tensor.
• axis (`Int`): The axis.
• output (`NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), out_type]`): The output tensor.
• out_chain (`OutputChainPtr`): The chain to attach results to.

`argmin[type: DType, out_type: DType, axis_type: DType, rank: Int](input: NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), type], axis_buf: NDBuffer[1, create_unknown[\$builtin::\$int::Int][1](), axis_type], output: NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), out_type], out_chain: OutputChainPtr)`

Finds the indices of the minimum element along the specified axis.

Parameters:

• type (`DType`): Type of the input tensor.
• out_type (`DType`): Type of the output tensor.
• axis_type (`DType`): Type of the axis tensor.
• rank (`Int`): The rank of the input / output.

Args:

• input (`NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), type]`): The input tensor.
• axis_buf (`NDBuffer[1, create_unknown[\$builtin::\$int::Int][1](), axis_type]`): The axis tensor.
• output (`NDBuffer[rank, create_unknown[\$builtin::\$int::Int][rank](), out_type]`): The axis tensor.
• out_chain (`OutputChainPtr`): The chain to attach results to.

## `reduce_shape`

`reduce_shape[input_rank: Int, input_type: DType, axis_type: DType, single_thread_blocking_override: Bool](input_buf: NDBuffer[input_rank, create_unknown[\$builtin::\$int::Int][input_rank](), input_type], axis_buf: NDBuffer[1, create_unknown[\$builtin::\$int::Int][1](), axis_type]) -> StaticIntTuple[input_rank]`

Compute the output shape of a `pad` operation, and assert the inputs are compatible.

Parameters:

• input_rank (`Int`): Input_rank of the input tensor.
• input_type (`DType`): Type of the input tensor.
• axis_type (`DType`): Type of the axis tensor.
• single_thread_blocking_override (`Bool`): Whether this function can block.

Args:

• input_buf (`NDBuffer[input_rank, create_unknown[\$builtin::\$int::Int][input_rank](), input_type]`): The input tensor.
• axis_buf (`NDBuffer[1, create_unknown[\$builtin::\$int::Int][1](), axis_type]`): The axis tensor.

Returns:

The output shape.

## `cumsum`

`cumsum[size: Int, type: DType](dst: Buffer[__init__(size), type], src: Buffer[__init__(size), type])`

Computes the cumulative sum of all elements in a buffer. dst[i] = src[i] + src[i-1] + â€¦ + src[0].

Parameters:

• size (`Int`): The size of the input and output buffers.
• type (`DType`): The type of the elements of the input and output buffers.

Args:

• dst (`Buffer[__init__(size), type]`): The buffer that stores the result of cumulative sum operation.
• src (`Buffer[__init__(size), type]`): The buffer of elements for which the cumulative sum is computed.