Mojo function
rms_norm_gpu_warp_tiling
rms_norm_gpu_warp_tiling[dtype: DType, //, simd_width: Int, max_warps_per_block: Int, input_fn: fn[Int](row: Int, col: Int) capturing -> SIMD[dtype, $0], output_fn: fn[Int, Int](row: Int, col: Int, val: SIMD[dtype, $0]) capturing -> None, multiply_before_cast: Bool](gamma: NDBuffer[dtype, 1, MutableAnyOrigin], epsilon: SIMD[dtype, 1], weight_offset: SIMD[dtype, 1], num_cols: Int)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!