Mojo function
block_reduce
block_reduce[BLOCK_SIZE: Int, reduce_fn: fn[dtype: DType, width: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width], dtype: DType, simd_width: Int](val: SIMD[dtype, simd_width], init: Scalar[dtype]) -> Scalar[dtype]
Performs a block-level reduction of a single SIMD value across all threads in a GPU thread block using warp-level primitives and shared memory.
Parameters:
- BLOCK_SIZE (
Int): The number of threads per block. - reduce_fn (
fn[dtype: DType, width: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width]): The binary reduction function. - dtype (
DType): The data type of the elements. - simd_width (
Int): The SIMD vector width.
Args:
- val (
SIMD): The per-thread SIMD value to reduce. - init (
Scalar): The identity value for the reduction.
Returns:
Scalar: The reduced scalar result (valid on thread 0).
block_reduce[BLOCK_SIZE: Int, num_reductions: Int, reduce_fn: fn[dtype: DType, width: Int, reduction_idx: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width], dtype: DType, simd_width: Int](val: StaticTuple[SIMD[dtype, simd_width], num_reductions], init: StaticTuple[Scalar[dtype], num_reductions]) -> StaticTuple[Scalar[dtype], num_reductions]
Performs a block-level reduction of multiple fused SIMD values across all threads in a GPU thread block using warp shuffles and shared memory.
Parameters:
- BLOCK_SIZE (
Int): The number of threads per block. - num_reductions (
Int): The number of fused reductions to perform. - reduce_fn (
fn[dtype: DType, width: Int, reduction_idx: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width]): The binary reduction function, parameterized by reduction index. - dtype (
DType): The data type of the elements. - simd_width (
Int): The SIMD vector width.
Args:
- val (
StaticTuple): The per-thread SIMD values to reduce, one per reduction. - init (
StaticTuple): The identity values for each reduction.
Returns:
StaticTuple: The reduced scalar results (valid on thread 0).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!