For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
normalization
Functionsβ
- β
apply_qk_rms_norm: Fused per-element QK-RMSNorm apply for two operands Q and K. - β
apply_qk_rms_norm_cpu: Naive CPU reference path (also used as a correctness oracle). - β
apply_qk_rms_norm_gpu: Launches the fused Q/K RMSNorm apply: one launch, grid (rows, 2). - β
apply_qk_rms_norm_gpu_block: Fused per-element QK-RMSNorm apply for Q and K in a single launch. - β
block_reduce: - β
block_reduce_dual_sum: Combined block reduction for two sums using only 2 barriers. - β
group_norm: - β
group_norm_gpu: - β
group_norm_gpu_block: - β
group_norm_gpu_multi_block_norm: Multi-block normalize kernel: reduces partial stats and normalizes. - β
group_norm_gpu_multi_block_stats: Multi-block stats kernel: computes partial Welford statistics per split. - β
group_norm_gpu_warp_tiling: - β
group_norm_reshape: Reshapes an input buffer for group normalization by flattening all dimensions except the group dimension. Returns a 2D buffer of shape (num_groups * N, group_size), where group_size is the product of channels_per_group and spatial. - β
layer_norm: - β
layer_norm_cpu: Computes layernorm(elementwise_fn(x)) across the last dimension of x, where layernorm is defined as . - β
layer_norm_gpu: - β
layer_norm_gpu_block: - β
layer_norm_gpu_warp_tiling: - β
layer_norm_reshape: - β
layer_norm_shape: Compute the output shape of alayer_normoperation. - β
rms_norm: - β
rms_norm_cpu: - β
rms_norm_fused_residual_add: - β
rms_norm_fused_residual_add_cpu: - β
rms_norm_fused_residual_add_gpu: - β
rms_norm_fused_residual_add_gpu_block: - β
rms_norm_fused_residual_add_gpu_block_no_shmem: RMS norm fused with residual add, without shared memory reductions. - β
rms_norm_fused_residual_add_gpu_warp_tiling: - β
rms_norm_gpu: - β
rms_norm_gpu_block: - β
rms_norm_gpu_warp_tiling: - β
rms_norm_gpu_warp_tiling_128: - β
rms_norm_rope_gpu: Fused RMS normalization followed by Rotary Position Embedding (RoPE) for GPU. - β
row_mean_of_squares: Per-row mean of squares over the last axis, accumulated inaccum_type. - β
row_mean_of_squares_cpu: Naive CPU reference path (also used as a correctness oracle). - β
row_mean_of_squares_gpu: Launches the GPU mean-of-squares reduction: one block per row. - β
row_mean_of_squares_gpu_block: - β
row_mean_of_squares_qk: Fused per-row mean of squares for two operands Q and K. - β
row_mean_of_squares_qk_cpu: Naive CPU reference path (also used as a correctness oracle). - β
row_mean_of_squares_qk_gpu: Launches the fused Q/K mean-of-squares reduction: one launch, grid (rows, 2). - β
row_mean_of_squares_qk_gpu_block: Fused per-row mean of squares for Q and K in a single launch. - β
welford_block_all_reduce: - β
welford_combine: - β
welford_update: - β
welford_warp_reduce:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!