For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo function
row_mean_of_squares_gpu
def row_mean_of_squares_gpu[in_dtype: DType, out_dtype: DType, //, input_fn: def[width: Int](row: Int, col: Int) capturing -> SIMD[in_dtype, width], output_fn: def(row: Int, val: Scalar[out_dtype]) capturing -> None, pdl_level: PDLLevel = PDLLevel.ON](rows: Int, cols: Int, ctx: DeviceContext)
Launches the GPU mean-of-squares reduction: one block per row.
SM100 (B200) primary target; uses only block_reduce so it is portable.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!