IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

dispatch_gemv

def dispatch_gemv[c_type: DType, a_type: DType, b_type: DType, //, transpose_b: Bool = False, elementwise_lambda_fn: Optional[def[dtype: DType, width: SIMDSize, *, alignment: Int = Int(1)](IndexList[Int(2)], SIMD[dtype, width]) capturing -> None] = None, elementwise_lambda_wrapper: Optional[def[dtype: DType, width: SIMDSize, *, alignment: Int = Int(1)](IndexList[Int(2)], SIMD[dtype, width]) capturing -> None] = None, elementwise_compute_lambda_fn: Optional[def[dtype: DType, width: SIMDSize, *, alignment: Int = Int(1)](IndexList[Int(2)], SIMD[dtype, width]) capturing -> SIMD[dtype, width]] = None, pdl_level: PDLLevel = PDLLevel()](c: TileTensor[c_type, Storage=c.Storage, address_space=c.address_space, linear_idx_type=c.linear_idx_type, element_size=c.element_size], a: TileTensor[a_type, Storage=a.Storage, address_space=a.address_space, linear_idx_type=a.linear_idx_type, element_size=a.element_size], b: TileTensor[b_type, Storage=b.Storage, address_space=b.address_space, linear_idx_type=b.linear_idx_type, element_size=b.element_size], ctx: DeviceContext)

Dispatch M=1 (or N=1) matmul to GEMV or SM100 GEMM based on (N, K).

For most M=1 shapes GEMV is preferred, but for certain large (N, K) combinations the SM100 GEMM kernel achieves higher throughput. Add new (N, K) pairs to SM100_GEMV_SHAPES as they are identified through benchmarking.

N=1 always routes to GEMV: SM100 TMA requires N * sizeof(c_type) % 16 == 0.