Mojo function
max
max[dtype: DType, width: Int, //, *, block_size: Int, broadcast: Bool = True](val: SIMD[dtype, width]) -> SIMD[dtype, width]
Computes the maximum value across all threads in a block.
Performs a parallel reduction using warp-level operations and shared memory to find the global maximum across all threads in the block.
Parameters:
- dtype (
DType): The data type of the SIMD elements. - width (
Int): The number of elements in each SIMD vector. - block_size (
Int): The total number of threads in the block. - broadcast (
Bool): If True, the final reduced value is broadcast to all threads in the block. If False, only the first thread will have the complete result.
Args:
- val (
SIMD): The SIMD value to reduce. Each thread contributes its value to find the maximum.
Returns:
SIMD: If broadcast is True, each thread in the block will receive the maximum
value across the entire block. Otherwise, only the first thread will
have the complete result.
max[dtype: DType, width: Int, //, *, block_dim_x: Int, block_dim_y: Int, block_dim_z: Int = 1, broadcast: Bool = True](val: SIMD[dtype, width]) -> SIMD[dtype, width]
Computes the maximum value across all threads in a multi-dimensional block.
Performs a parallel reduction using warp-level operations and shared memory
to find the global maximum across all threads in the block. Thread IDs are
linearized in row-major order: x + y * dim_x + z * dim_x * dim_y.
Parameters:
- dtype (
DType): The data type of the SIMD elements. - width (
Int): The number of elements in each SIMD vector. - block_dim_x (
Int): The number of threads along the X dimension. - block_dim_y (
Int): The number of threads along the Y dimension. - block_dim_z (
Int): The number of threads along the Z dimension (default: 1). - broadcast (
Bool): If True, the final reduced value is broadcast to all threads in the block. If False, only the first thread will have the complete result.
Args:
- val (
SIMD): The SIMD value to reduce. Each thread contributes its value to find the maximum.
Returns:
SIMD: If broadcast is True, each thread in the block will receive the maximum
value across the entire block. Otherwise, only the first thread will
have the complete result.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!