Mojo module
mla_index_fp8
MLA FP8 index kernel for computing attention scores with paged KV cache.
Functionsβ
- β
apply_mask_kernel: Apply causal mask to the output scores. - β
fill_invalid_topk_kernel: Fill invalid positions with -1 in topk output. - β
mla_indexer_ragged_float8_paged: Compute FP8 indexed attention scores using paged KV cache and return top-k indices.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!