Skip to main content
Log in

Mojo function

prefix_sum

prefix_sum[type: DType, //, intermediate_type: DType = type, *, output_type: DType = type, exclusive: Bool = False](x: SIMD[type, 1]) -> SIMD[output_type, 1]

Computes a warp-level prefix sum (scan) operation.

Performs an inclusive or exclusive prefix sum across threads in a warp using a parallel scan algorithm with warp shuffle operations. This implements an efficient parallel scan with logarithmic complexity.

For example, if we have a warp with the following elements:

[x0,x1,x2,x3,x4][x_0, x_1, x_2, x_3, x_4]

The prefix sum is:

[x0,x0+x1,x0+x1+x2,x0+x1+x2+x3,x0+x1+x2+x3+x4][x_0, x_0 + x_1, x_0 + x_1 + x_2, x_0 + x_1 + x_2 + x_3, x_0 + x_1 + x_2 + x_3 + x_4]

Parameters:

  • type (DType): The data type of the input SIMD elements.
  • intermediate_type (DType): Type used for intermediate calculations (defaults to input type).
  • output_type (DType): The desired output data type (defaults to input type).
  • exclusive (Bool): If True, performs exclusive scan where each thread receives the sum of all previous threads. If False (default), performs inclusive scan where each thread receives the sum including its own value.

Args:

  • x (SIMD[type, 1]): The SIMD value to include in the prefix sum.

Returns:

A scalar containing the prefix sum at the current thread's position in the warp, cast to the specified output type.