Skip to main content

Mojo module

mla_graph

Functions

  • concat_q_nope_proj_rope: Concatenate projected q_nope and q_rope into a single q tensor.
  • fused_rope_rmsnorm_kernel: Fused GPU kernel that applies RoPE to query projections and RMSNorm to KV cache.
  • mla_decode_branch_bf16: BF16 MLA decode path.
  • mla_decode_branch_fp8: This is a manually fused kernel that performs the following operations: - Apply RoPE to the query and the key cache (in-place). - Apply RMSNorm to the non-rope portion of the key cache (in-place). - Project q_nope to kv_latent_dim through a fp8 batched matmul: q_nope_proj = q_nope_t @ w_uk. - Concatenate q_nope_proj and q_rope: q_full = concat(q_nope_proj, q_rope, axis=2). - Perform MLA decode. - Project raw_output to v_head_dim through another fp8 batched matmul: output = raw_output_t @ w_uv.
  • mla_fused_rope_rmsnorm: Launches the fused RoPE and RMSNorm kernel for MLA attention.
  • mla_prefill_branch_bf16: BF16 MLA prefill path.
  • mla_prefill_branch_fp8: This is a manually fused kernel that performs the following operations: - Apply RoPE to the query and the key cache (in-place). - Apply RMSNorm to the non-rope portion of the key cache (in-place). - Copy the KV latent values from PagedKVCache to a contiguous buffer. - Quantize the KV latent values to fp8. - Up-project the latent KV values to full K and V through a matmul. - Split the concatenated KV into K and V. - Perform MLA prefill.
  • mla_prefill_decode_graph_bf16: BF16 MLA prefill/decode graph.
  • mla_prefill_decode_graph_fp8: This is a manually fused kernel that performs the following operations: - Perform MLA prefill or decode based on the maximum sequence length.
  • quantize_and_bmm_fp8_helper: Helper function to quantize and perform a batched matrix multiplication. This function uses the transposed view of the input tensor a.
  • split_kv_buffer: Split a packed KV buffer into separate K and V tensors.
  • transpose_helper: Helper function to transpose a tensor from [B, N, K] to [N, B, K] (or vice versa).

Was this page helpful?