For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

gemv_split_k

def gemv_split_k[c_type: DType, a_type: DType, b_type: DType, c_layout: TensorLayout, a_layout: TensorLayout, b_layout: TensorLayout, simd_width: Int, tile_m: Int, tile_n: Int, num_threads: Int, unroll_factor: Int = 2, elementwise_lambda_fn: Optional[def[dtype: DType, width: Int, *, alignment: Int = 1](IndexList[2], SIMD[dtype, width]) capturing -> None] = None, accum_type: DType = get_accum_type[c_type](), check_bounds: Bool = True, pdl_level: PDLLevel = PDLLevel()](output: TileTensor[c_type, c_layout, MutAnyOrigin], act: TileTensor[a_type, a_layout, ImmutAnyOrigin], weight: TileTensor[b_type, b_layout, ImmutAnyOrigin], m: Int, n: Int, k: Int)

GEMV with tiling in K dimension. Assuming the B (weight) matrix is transposed i.e. row major N x K, this kernel implements a vector (1 x K) times a matrix (N x K). The impl can actually handle M > 1 but it's only optimal for tiny M. We use it for M = 1 only.