For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

gemv_split_k

def gemv_split_k[c_type: DType, a_type: DType, b_type: DType, c_layout: TensorLayout, a_layout: TensorLayout, b_layout: TensorLayout, simd_width: Int, tile_m: Int, tile_n: Int, num_threads: Int, unroll_factor: Int = Int(2), elementwise_lambda_fn: Optional[def[dtype: DType, width: SIMDSize, *, alignment: Int = Int(1)](IndexList[Int(2)], SIMD[dtype, width]) capturing -> None] = None, accum_type: DType = get_accum_type[c_type](), check_bounds_m: Bool = True, check_bounds_n: Bool = True, pdl_level: PDLLevel = PDLLevel()](output: TileTensor[c_type, c_layout, MutAnyOrigin], act: TileTensor[a_type, a_layout, ImmutAnyOrigin], weight: TileTensor[b_type, b_layout, ImmutAnyOrigin], m: Int, n: Int, k: Int)

GEMV with tiling in K dimension. Assuming the B (weight) matrix is transposed i.e. row major N x K, this kernel implements a vector (1 x K) times a matrix (N x K). The impl can actually handle M > 1 but it's only optimal for tiny M. We use it for M = 1 only.

The launch grid covers ceildiv(m, tile_m) * tile_m rows and ceildiv(n, tile_n) * tile_n columns, so the final blocks read and write past the buffers unless the bounds guards are on: check_bounds_m=False is only safe when the launcher guarantees m % tile_m == 0 (m is a runtime value, so tile_m == 1 is the usual way to guarantee it), and check_bounds_n=False is only safe when n % tile_n == 0.