Mojo module
mla_graph
Functions
-
mla_prefill_branch_fp8: This is a manually fused kernel that performs the following operations: - Copy the KV latent values from PagedKVCache to a contiguous buffer. - Quantize the KV latent values to fp8. - Up-project the latent KV values to full K and V through a matmul. - Split the concatenated KV into K and V. - Perform MLA prefill.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!