IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

group_norm_gpu_multi_block_stats

def group_norm_gpu_multi_block_stats[StatsLayoutType: TensorLayout, stats_origin: MutOrigin, //, dtype: DType, simd_width: Int, input_fn: def[width: Int](row: Int, col: Int) capturing -> SIMD[dtype, width]](stats: TileTensor[get_accum_type[dtype](), StatsLayoutType, stats_origin], num_splits: Int, group_size: Int)

Multi-block stats kernel: computes partial Welford statistics per split.

Grid: num_rows * num_splits blocks. Each block handles one split of one group and writes partial (mean, m2, count) to the stats buffer. Stats layout: stats[block_idx * 3 + {0,1,2}] = {mean, m2, count}.