Mojo function
sum
sum[val_type: DType, simd_width: Int, //](val: SIMD[val_type, simd_width]) -> SIMD[val_type, simd_width]
Computes the sum of values across all lanes in a warp.
This is a convenience wrapper around lane_group_sum that operates on the entire warp. It performs a parallel reduction using warp shuffle operations to find the global sum across all lanes in the warp.
Parameters:
- val_type (
DType
): The data type of the SIMD elements (e.g. float32, int32). - simd_width (
Int
): The number of elements in the SIMD vector.
Args:
- val (
SIMD[val_type, simd_width]
): The SIMD value to reduce. Each lane contributes its value to the sum.
Returns:
A SIMD value where all lanes contain the sum found across the entire warp. The sum is broadcast to all lanes.
sum[intermediate_type: DType, *, reduction_method: ReductionMethod, output_type: DType](x: SIMD[dtype, size]) -> SIMD[output_type, 1]
Performs a warp-level reduction to compute the sum of values across threads.
This function provides two reduction methods:
- Warp shuffle: Uses warp shuffle operations to efficiently sum values across threads
- Tensor core: Leverages tensor cores for high-performance reductions, with type casting
The tensor core method will cast the input to the specified intermediate type before reduction to ensure compatibility with tensor core operations. The warp shuffle method requires the output type to match the input type.
Constraints:
- For warp shuffle reduction, output_type must match the input value type.
- For tensor core reduction, input will be cast to intermediate_type.
Parameters:
- intermediate_type (
DType
): The data type to cast to when using tensor core reduction. - reduction_method (
ReductionMethod
):WARP
for warp shuffle orTENSOR_CORE
for tensor core reduction. - output_type (
DType
): The desired output data type for the reduced value.
Args:
- x (
SIMD[dtype, size]
): The SIMD value to reduce across the warp.
Returns:
A scalar containing the sum of the input values across all threads in the warp, cast to the specified output type.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!