For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

group_norm_gpu_multi_block_norm

def group_norm_gpu_multi_block_norm[OutputLayoutType: TensorLayout, output_origin: MutOrigin, StatsLayoutType: TensorLayout, stats_origin: MutOrigin, //, dtype: DType, simd_width: Int, input_fn: def[width: Int](row: Int, col: Int) capturing -> SIMD[dtype, width], gamma_fn: def[width: Int](IndexList[Int(1)]) capturing -> SIMD[dtype, width], beta_fn: def[width: Int](IndexList[Int(1)]) capturing -> SIMD[dtype, width]](output: TileTensor[dtype, OutputLayoutType, output_origin], stats: TileTensor[get_accum_type[dtype](), StatsLayoutType, stats_origin], epsilon: Scalar[dtype], num_groups: Int, channels_per_group: Int, spatial: Int, num_splits: Int, group_size: Int)

Multi-block normalize kernel: reduces partial stats and normalizes.

Grid: num_rows * num_splits blocks. Each block reads all partial stats for its group, reduces to final mean/variance, then normalizes its chunk of elements.