IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

gemv_and_partial_norm

def gemv_and_partial_norm[c_type: DType, a_type: DType, //, *, transpose_b: Bool = True, fused: Bool = True, tile_n: Int = 4, num_threads: Int = 256, pdl_level: PDLLevel = PDLLevel()](normed_output: TileTensor[c_type, address_space=normed_output.address_space, linear_idx_type=normed_output.linear_idx_type, element_size=normed_output.element_size], unnormed_output: TileTensor[c_type, address_space=unnormed_output.address_space, linear_idx_type=unnormed_output.linear_idx_type, element_size=unnormed_output.element_size], act: TileTensor[a_type, address_space=act.address_space, linear_idx_type=act.linear_idx_type, element_size=act.element_size], weight: TileTensor[a_type, address_space=weight.address_space, linear_idx_type=weight.linear_idx_type, element_size=weight.element_size], gamma: TileTensor[a_type, address_space=gamma.address_space, linear_idx_type=gamma.linear_idx_type, element_size=gamma.element_size], eps: Scalar[a_type], ctx: DeviceContext)

Computes y = act @ weight.T, then partitions y into a normed front and an unnormed tail.

Parameters:

  • ​c_type (DType): Output dtype.
  • ​a_type (DType): Activation / weight / gamma dtype.
  • ​transpose_b (Bool): If True, weight is row-major [N, K] used as weight.T.
  • ​fused (Bool): Compile-time flag. True (default) selects the single- kernel fused path (M=1 only). False selects the 2-launch baseline (matmul + rms_norm_gpu; the unnormed tail is a view into the matmul output, so unnormed_output is left untouched).
  • ​tile_n (Int): Comptime tile width in columns (fused only).
  • ​num_threads (Int): Comptime threads per block (fused only).
  • ​pdl_level (PDLLevel): Programmatic Dependent Launch level.

Args:

Raises:

Error: If _matmul_gpu or rms_norm_gpu fail to launch, or if internal scratch allocation fails.