Skip to main content

Mojo function

sum

sum[dtype: DType, width: Int, //, *, block_size: Int, broadcast: Bool = True](val: SIMD[dtype, width]) -> SIMD[dtype, width]

Computes the sum of values across all threads in a block.

Performs a parallel reduction using warp-level operations and shared memory to find the global sum across all threads in the block.

Parameters:

  • dtype (DType): The data type of the SIMD elements.
  • width (Int): The number of elements in each SIMD vector.
  • block_size (Int): The total number of threads in the block.
  • broadcast (Bool): If True, the final sum is broadcast to all threads in the block. If False, only the first thread will have the complete sum.

Args:

  • val (SIMD): The SIMD value to reduce. Each thread contributes its value to the sum.

Returns:

SIMD: If broadcast is True, each thread in the block will receive the final sum. Otherwise, only the first thread will have the complete sum.

sum[dtype: DType, width: Int, //, *, block_dim_x: Int, block_dim_y: Int, block_dim_z: Int = 1, broadcast: Bool = True](val: SIMD[dtype, width]) -> SIMD[dtype, width]

Computes the sum of values across all threads in a multi-dimensional block.

Performs a parallel reduction using warp-level operations and shared memory to find the global sum across all threads in the block. Thread IDs are linearized in row-major order: x + y * dim_x + z * dim_x * dim_y.

Parameters:

  • dtype (DType): The data type of the SIMD elements.
  • width (Int): The number of elements in each SIMD vector.
  • block_dim_x (Int): The number of threads along the X dimension.
  • block_dim_y (Int): The number of threads along the Y dimension.
  • block_dim_z (Int): The number of threads along the Z dimension (default: 1).
  • broadcast (Bool): If True, the final sum is broadcast to all threads in the block. If False, only the first thread will have the complete sum.

Args:

  • val (SIMD): The SIMD value to reduce. Each thread contributes its value to the sum.

Returns:

SIMD: If broadcast is True, each thread in the block will receive the final sum. Otherwise, only the first thread will have the complete sum.

Was this page helpful?