IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

rms_norm_rope_gpu

def rms_norm_rope_gpu[input_dtype: DType, cos_sin_dtype: DType, rank: Int, //, input_fn: def[width: Int, rank: Int, alignment: Int](IndexList[rank]) capturing -> SIMD[input_dtype, width], cos_fn: def[width: Int, rank: Int, alignment: Int](IndexList[rank]) capturing -> SIMD[cos_sin_dtype, width], sin_fn: def[width: Int, rank: Int, alignment: Int](IndexList[rank]) capturing -> SIMD[cos_sin_dtype, width], output_fn: def[width: Int, alignment: Int](IndexList[rank], SIMD[input_dtype, width]) capturing -> None, multiply_before_cast: Bool, pdl_level: PDLLevel = PDLLevel.ON](shape: IndexList[rank, element_type=shape.element_type], gamma: TileTensor[input_dtype, Storage=gamma.Storage, address_space=gamma.address_space, linear_idx_type=gamma.linear_idx_type, element_size=gamma.element_size], epsilon: Scalar[input_dtype], weight_offset: Scalar[input_dtype], cos_vals: TileTensor[cos_sin_dtype, Storage=cos_vals.Storage, address_space=cos_vals.address_space, linear_idx_type=cos_vals.linear_idx_type, element_size=cos_vals.element_size], sin_vals: TileTensor[cos_sin_dtype, Storage=sin_vals.Storage, address_space=sin_vals.address_space, linear_idx_type=sin_vals.linear_idx_type, element_size=sin_vals.element_size], ctx: DeviceContext)

Fused RMS normalization followed by Rotary Position Embedding (RoPE) for GPU.

Computes: normed = rms_norm(input, gamma, epsilon, weight_offset) x1, x2 = split(normed, axis=-1) # halves along last dim rotated = concat(-x2, x1, axis=-1) output = normed * cos_vals + rotated * sin_vals

The last dimension must be a known even number.