For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

row_mean_of_squares_gpu

def row_mean_of_squares_gpu[in_dtype: DType, out_dtype: DType, //, input_fn: def[width: Int](row: Int, col: Int) capturing -> SIMD[in_dtype, width], output_fn: def(row: Int, val: Scalar[out_dtype]) capturing -> None, pdl_level: PDLLevel = PDLLevel.ON](rows: Int, cols: Int, ctx: DeviceContext)

Launches the GPU mean-of-squares reduction: one block per row.

SM100 (B200) primary target; uses only block_reduce so it is portable.