reduction

Module

Implements SIMD reductions.

You can import these APIs from the algorithm package. For example:

from algorithm import map_reduce

map_reduce

map_reduce[simd_width: Int, size: Dim, type: DType, acc_type: DType, input_gen_fn: fn[DType, Int](Int) capturing -> SIMD[*(0,0), *(0,1)], reduce_vec_to_vec_fn: fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)], reduce_vec_to_scalar_fn: fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1]](dst: Buffer[size, type], init: SIMD[acc_type, 1]) -> SIMD[acc_type, 1]

Stores the result of calling input_gen_fn in dst and simultaneously reduce the result using a custom reduction function.

Parameters:

  • simd_width (Int): The vector width for the computation.
  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.
  • acc_type (DType): The dtype of the reduction accumulator.
  • input_gen_fn (fn[DType, Int](Int) capturing -> SIMD[*(0,0), *(0,1)]): A function that generates inputs to reduce.
  • reduce_vec_to_vec_fn (fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)]): A mapping function. This function is used to combine (accumulate) two chunks of input data: e.g. we load two 8xfloat32 vectors of elements and need to reduce them into a single 8xfloat32 vector.
  • reduce_vec_to_scalar_fn (fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1]): A reduction function. This function is used to reduce a vector to a scalar. E.g. when we got 8xfloat32 vector and want to reduce it to an float32 scalar.

Args:

  • dst (Buffer[size, type]): The output buffer.
  • init (SIMD[acc_type, 1]): The initial value to use in accumulator.

Returns:

The computed reduction value.

reduce

reduce[simd_width: Int, size: Dim, type: DType, acc_type: DType, map_fn: fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)], reduce_fn: fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1]](src: Buffer[size, type], init: SIMD[acc_type, 1]) -> SIMD[acc_type, 1]

Computes a custom reduction of buffer elements.

Parameters:

  • simd_width (Int): The vector width for the computation.
  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.
  • acc_type (DType): The dtype of the reduction accumulator.
  • map_fn (fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)]): A mapping function. This function is used when to combine (accumulate) two chunks of input data: e.g. we load two 8xfloat32 vectors of elements and need to reduce them to a single 8xfloat32 vector.
  • reduce_fn (fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1]): A reduction function. This function is used to reduce a vector to a scalar. E.g. when we got 8xfloat32 vector and want to reduce it to 1xfloat32.

Args:

  • src (Buffer[size, type]): The input buffer.
  • init (SIMD[acc_type, 1]): The initial value to use in accumulator.

Returns:

The computed reduction value.

reduce[simd_width: Int, rank: Int, input_shape: DimList, output_shape: DimList, type: DType, acc_type: DType, map_fn: fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)], reduce_fn: fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1], reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, acc_type], init: SIMD[acc_type, 1])

Performs a reduction across reduce_axis of an NDBuffer (src) and stores the result in an NDBuffer (dst).

First src is reshaped into a 3D tensor. Without loss of generality, the three axes will be referred to as [H,W,C], where the axis to reduce across is W, the axes before the reduce axis are packed into H, and the axes after the reduce axis are packed into C. i.e. a tensor with dims [D1, D2, …, Di, …, Dn] reducing across axis i gets packed into a 3D tensor with dims [H, W, C], where H=prod(D1,…,Di-1), W = Di, and C = prod(Di+1,…,Dn).

Parameters:

  • simd_width (Int): The vector width for the computation.
  • rank (Int): The rank of the input/output buffers.
  • input_shape (DimList): The input buffer shape.
  • output_shape (DimList): The output buffer shape.
  • type (DType): The buffer elements dtype.
  • acc_type (DType): The dtype of the reduction accumulator.
  • map_fn (fn[DType, DType, Int](SIMD[*(0,0), *(0,2)], SIMD[*(0,1), *(0,2)]) capturing -> SIMD[*(0,0), *(0,2)]): A mapping function. This function is used when to combine (accumulate) two chunks of input data: e.g. we load two 8xfloat32 vectors of elements and need to reduce them to a single 8xfloat32 vector.
  • reduce_fn (fn[DType, Int](SIMD[*(0,0), *(0,1)]) -> SIMD[*(0,0), 1]): A reduction function. This function is used to reduce a vector to a scalar. E.g. when we got 8xfloat32 vector and want to reduce it to 1xfloat32.
  • reduce_axis (Int): The axis to reduce across.

Args:

  • src (NDBuffer[rank, input_shape, type]): The input buffer.
  • dst (NDBuffer[rank, output_shape, acc_type]): The output buffer.
  • init (SIMD[acc_type, 1]): The initial value to use in accumulator.

reduce_boolean

reduce_boolean[simd_width: Int, size: Dim, type: DType, reduce_fn: fn[DType, Int](SIMD[*(0,0), *(0,1)]) capturing -> Bool, continue_fn: fn(Bool) capturing -> Bool](src: Buffer[size, type], init: Bool) -> Bool

Computes a bool reduction of buffer elements. The reduction will early exit if the continue_fn returns False.

Parameters:

  • simd_width (Int): The vector width for the computation.
  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.
  • reduce_fn (fn[DType, Int](SIMD[*(0,0), *(0,1)]) capturing -> Bool): A boolean reduction function. This function is used to reduce a vector to a scalar. E.g. when we got 8xfloat32 vector and want to reduce it to a bool.
  • continue_fn (fn(Bool) capturing -> Bool): A function to indicate whether we want to continue processing the rest of the iterations. This takes the result of the reduce_fn and returns True to continue processing and False to early exit.

Args:

  • src (Buffer[size, type]): The input buffer.
  • init (Bool): The initial value to use.

Returns:

The computed reduction value.

max

max[size: Dim, type: DType](src: Buffer[size, type]) -> SIMD[type, 1]

Computes the max element in a buffer.

Parameters:

  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.

Args:

  • src (Buffer[size, type]): The buffer.

Returns:

The maximum of the buffer elements.

max[rank: Int, input_shape: DimList, output_shape: DimList, type: DType, reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, type])

Computes the max across reduce_axis of an NDBuffer.

Parameters:

  • rank (Int): The rank of the input/output buffers.
  • input_shape (DimList): The input buffer shape.
  • output_shape (DimList): The output buffer shape.
  • type (DType): The buffer elements dtype.
  • reduce_axis (Int): The axis to reduce across.

Args:

  • src (NDBuffer[rank, input_shape, type]): The input buffer.
  • dst (NDBuffer[rank, output_shape, type]): The output buffer.

min

min[size: Dim, type: DType](src: Buffer[size, type]) -> SIMD[type, 1]

Computes the min element in a buffer.

Parameters:

  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.

Args:

  • src (Buffer[size, type]): The buffer.

Returns:

The minimum of the buffer elements.

min[rank: Int, input_shape: DimList, output_shape: DimList, type: DType, reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, type])

Computes the min across reduce_axis of an NDBuffer.

Parameters:

  • rank (Int): The rank of the input/output buffers.
  • input_shape (DimList): The input buffer shape.
  • output_shape (DimList): The output buffer shape.
  • type (DType): The buffer elements dtype.
  • reduce_axis (Int): The axis to reduce across.

Args:

  • src (NDBuffer[rank, input_shape, type]): The input buffer.
  • dst (NDBuffer[rank, output_shape, type]): The output buffer.

sum

sum[size: Dim, type: DType](src: Buffer[size, type]) -> SIMD[type, 1]

Computes the sum of buffer elements.

Parameters:

  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.

Args:

  • src (Buffer[size, type]): The buffer.

Returns:

The sum of the buffer elements.

sum[rank: Int, input_shape: DimList, output_shape: DimList, type: DType, reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, type])

Computes the sum across reduce_axis of an NDBuffer.

Parameters:

  • rank (Int): The rank of the input/output buffers.
  • input_shape (DimList): The input buffer shape.
  • output_shape (DimList): The output buffer shape.
  • type (DType): The buffer elements dtype.
  • reduce_axis (Int): The axis to reduce across.

Args:

  • src (NDBuffer[rank, input_shape, type]): The input buffer.
  • dst (NDBuffer[rank, output_shape, type]): The output buffer.

product

product[size: Dim, type: DType](src: Buffer[size, type]) -> SIMD[type, 1]

Computes the product of the buffer elements.

Parameters:

  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.

Args:

  • src (Buffer[size, type]): The buffer.

Returns:

The product of the buffer elements.

product[rank: Int, input_shape: DimList, output_shape: DimList, type: DType, reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, type])

Computes the product across reduce_axis of an NDBuffer.

Parameters:

  • rank (Int): The rank of the input/output buffers.
  • input_shape (DimList): The input buffer shape.
  • output_shape (DimList): The output buffer shape.
  • type (DType): The buffer elements dtype.
  • reduce_axis (Int): The axis to reduce across.

Args:

  • src (NDBuffer[rank, input_shape, type]): The input buffer.
  • dst (NDBuffer[rank, output_shape, type]): The output buffer.

mean

mean[size: Dim, type: DType](src: Buffer[size, type]) -> SIMD[type, 1]

Computes the mean value of the elements in a buffer.

Parameters:

  • size (Dim): The size of the input buffer..
  • type (DType): The type of the elements of the input buffer and output SIMD vector.

Args:

  • src (Buffer[size, type]): The buffer of elements for which the mean is computed.

Returns:

The mean value of the elements in the given buffer.

mean[rank: Int, input_shape: DimList, output_shape: DimList, type: DType, reduce_axis: Int](src: NDBuffer[rank, input_shape, type], dst: NDBuffer[rank, output_shape, type])

Computes the mean across reduce_axis of an NDBuffer.

Parameters:

  • rank (Int): The rank of the input/output buffers.
  • input_shape (DimList): The input buffer shape.
  • output_shape (DimList): The output buffer shape.
  • type (DType): The buffer elements dtype.
  • reduce_axis (Int): The axis to reduce across.

Args:

  • src (NDBuffer[rank, input_shape, type]): The input buffer.
  • dst (NDBuffer[rank, output_shape, type]): The output buffer.

variance

variance[size: Dim, type: DType](src: Buffer[size, type], mean_value: SIMD[type, 1], correction: Int) -> SIMD[type, 1]

Given a mean, computes the variance of elements in a buffer.

The mean value is used to avoid a second pass over the data:

variance = sum((x - E(x))^2) / (size - correction)

Parameters:

  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.

Args:

  • src (Buffer[size, type]): The buffer.
  • mean_value (SIMD[type, 1]): The mean value of the buffer.
  • correction (Int): Normalize variance by size - correction.

Returns:

The variance value of the elements in a buffer.

variance[size: Dim, type: DType](src: Buffer[size, type], correction: Int) -> SIMD[type, 1]

Computes the variance value of the elements in a buffer.

variance(src) = sum((x - E(x))^2) / (size - correction)

Parameters:

  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.

Args:

  • src (Buffer[size, type]): The buffer.
  • correction (Int): Normalize variance by size - correction (Default=1).

Returns:

The variance value of the elements in a buffer.

all_true

all_true[size: Dim, type: DType](src: Buffer[size, type]) -> Bool

Returns True if all the elements in a buffer are True and False otherwise.

Parameters:

  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.

Args:

  • src (Buffer[size, type]): The buffer.

Returns:

True if all of the elements of the buffer are True and False otherwise.

any_true

any_true[size: Dim, type: DType](src: Buffer[size, type]) -> Bool

Returns True if any the elements in a buffer are True and False otherwise.

Parameters:

  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.

Args:

  • src (Buffer[size, type]): The buffer.

Returns:

True if any of the elements of the buffer are True and False otherwise.

none_true

none_true[size: Dim, type: DType](src: Buffer[size, type]) -> Bool

Returns True if none of the elements in a buffer are True and False otherwise.

Parameters:

  • size (Dim): The buffer size.
  • type (DType): The buffer elements dtype.

Args:

  • src (Buffer[size, type]): The buffer.

Returns:

True if none of the elements of the buffer are True and False otherwise.

argmax

argmax[type: DType, out_type: DType, rank: Int](input: NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), type], axis: Int, output: NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), out_type], out_chain: OutputChainPtr)

Finds the indices of the maximum element along the specified axis.

Parameters:

  • type (DType): Type of the input tensor.
  • out_type (DType): Type of the output tensor.
  • rank (Int): The rank of the input / output.

Args:

  • input (NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), type]): The input tensor.
  • axis (Int): The axis.
  • output (NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), out_type]): The output tensor.
  • out_chain (OutputChainPtr): The chain to attach results to.

argmax[type: DType, out_type: DType, axis_type: DType, rank: Int](input: NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), type], axis_buf: NDBuffer[1, create_unknown[$builtin::$int::Int][1](), axis_type], output: NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), out_type], out_chain: OutputChainPtr)

Finds the indices of the maximum element along the specified axis.

Parameters:

  • type (DType): Type of the input tensor.
  • out_type (DType): Type of the output tensor.
  • axis_type (DType): Type of the axis tensor.
  • rank (Int): The rank of the input / output.

Args:

  • input (NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), type]): The input tensor.
  • axis_buf (NDBuffer[1, create_unknown[$builtin::$int::Int][1](), axis_type]): The axis tensor.
  • output (NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), out_type]): The axis tensor.
  • out_chain (OutputChainPtr): The chain to attach results to.

argmin

argmin[type: DType, out_type: DType, rank: Int](input: NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), type], axis: Int, output: NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), out_type], out_chain: OutputChainPtr)

Finds the indices of the maximum element along the specified axis.

Parameters:

  • type (DType): Type of the input tensor.
  • out_type (DType): Type of the output tensor.
  • rank (Int): The rank of the input / output.

Args:

  • input (NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), type]): The input tensor.
  • axis (Int): The axis.
  • output (NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), out_type]): The output tensor.
  • out_chain (OutputChainPtr): The chain to attach results to.

argmin[type: DType, out_type: DType, axis_type: DType, rank: Int](input: NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), type], axis_buf: NDBuffer[1, create_unknown[$builtin::$int::Int][1](), axis_type], output: NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), out_type], out_chain: OutputChainPtr)

Finds the indices of the minimum element along the specified axis.

Parameters:

  • type (DType): Type of the input tensor.
  • out_type (DType): Type of the output tensor.
  • axis_type (DType): Type of the axis tensor.
  • rank (Int): The rank of the input / output.

Args:

  • input (NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), type]): The input tensor.
  • axis_buf (NDBuffer[1, create_unknown[$builtin::$int::Int][1](), axis_type]): The axis tensor.
  • output (NDBuffer[rank, create_unknown[$builtin::$int::Int][rank](), out_type]): The axis tensor.
  • out_chain (OutputChainPtr): The chain to attach results to.

reduce_shape

reduce_shape[input_rank: Int, input_type: DType, axis_type: DType, single_thread_blocking_override: Bool](input_buf: NDBuffer[input_rank, create_unknown[$builtin::$int::Int][input_rank](), input_type], axis_buf: NDBuffer[1, create_unknown[$builtin::$int::Int][1](), axis_type]) -> StaticIntTuple[input_rank]

Compute the output shape of a pad operation, and assert the inputs are compatible.

Parameters:

  • input_rank (Int): Input_rank of the input tensor.
  • input_type (DType): Type of the input tensor.
  • axis_type (DType): Type of the axis tensor.
  • single_thread_blocking_override (Bool): Whether this function can block.

Args:

  • input_buf (NDBuffer[input_rank, create_unknown[$builtin::$int::Int][input_rank](), input_type]): The input tensor.
  • axis_buf (NDBuffer[1, create_unknown[$builtin::$int::Int][1](), axis_type]): The axis tensor.

Returns:

The output shape.

cumsum

cumsum[size: Int, type: DType](dst: Buffer[__init__(size), type], src: Buffer[__init__(size), type])

Computes the cumulative sum of all elements in a buffer. dst[i] = src[i] + src[i-1] + … + src[0].

Parameters:

  • size (Int): The size of the input and output buffers.
  • type (DType): The type of the elements of the input and output buffers.

Args:

  • dst (Buffer[__init__(size), type]): The buffer that stores the result of cumulative sum operation.
  • src (Buffer[__init__(size), type]): The buffer of elements for which the cumulative sum is computed.