Mojo function

max

max[dtype: DType, width: Int, //, *, block_size: Int, broadcast: Bool = True](val: SIMD[dtype, width]) -> SIMD[dtype, width]

Computes the maximum value across all threads in a block.

Performs a parallel reduction using warp-level operations and shared memory to find the global maximum across all threads in the block.

Parameters:

dtype (DType): The data type of the SIMD elements.
width (Int): The number of elements in each SIMD vector.
block_size (Int): The total number of threads in the block.
broadcast (Bool): If True, the final reduced value is broadcast to all threads in the block. If False, only the first thread will have the complete result.

Args:

val (SIMD): The SIMD value to reduce. Each thread contributes its value to find the maximum.

Returns:

SIMD: If broadcast is True, each thread in the block will receive the maximum value across the entire block. Otherwise, only the first thread will have the complete result.