For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

row_mean_of_squares

def row_mean_of_squares[in_dtype: DType, out_dtype: DType, rank: Int, //, input_0_fn: def[width: Int, rank: Int](IndexList[rank]) capturing -> SIMD[in_dtype, width], output_0_fn: def(row: Int, val: Scalar[out_dtype]) capturing -> None, /, target: StringSlice[StaticConstantOrigin] = StringSlice("cpu")](shape: IndexList[rank], ctx: DeviceContext)

Per-row mean of squares over the last axis, accumulated in accum_type.

For input flattened to [M, N], computes out[m] = sum_n(x[m,n]^2) / N and invokes output_0_fn(m, ...) once per row with an out_dtype scalar.

Parameters:

in_dtype (DType): Element type of the input (e.g. bfloat16 or float32).
out_dtype (DType): Element type of the per-row result (typically float32).
rank (Int): Rank of the logical input shape.
input_0_fn (def[width: Int, rank: Int](IndexList[rank]) capturing -> SIMD[in_dtype, width]): Loads width contiguous input elements at a 2D [row, col] position re-expressed as an n-D index.
output_0_fn (def(row: Int, val: Scalar[out_dtype]) capturing -> None): Receives (row, value) once per row.
target (StringSlice[StaticConstantOrigin]): "cpu" or a GPU target string.

Args:

shape (IndexList[rank]): Logical input shape. Reduction runs over the last axis.
ctx (DeviceContext): Device context (ignored on CPU).