Mojo module
kv_cache
Functionsβ
- β
copy_kv_pages_d2h: Copy selected pages for a single layer from device to host KV cache. - β
generic_flash_attention_kv_cache_padded: - β
generic_flash_attention_kv_cache_padded_materialized_mask: - β
generic_fused_qk_rope_bshd_continuous_batch: Performs a fused RoPE projection for Q and K projections. - β
generic_fused_qk_rope_bshd_paged: Performs a fused RoPE projection for Q and K with paged KV cache. - β
generic_fused_qkv_matmul_kv_cache_bshd_continuous_batch: Performs a fused QKV matmul. Q outputs are written to the output argument while K and V outputs are written in-place into k_cache and v_cache. - β
generic_fused_qkv_matmul_kv_cache_bshd_paged: Performs a fused QKV matmul. Q outputs are written to the output argument while K and V outputs are written in-place into k_cache and v_cache. - β
generic_get_continuous_cache: - β
generic_get_paged_cache: - β
generic_get_paged_cache_with_scales: Create a PagedKVCacheCollection with scales for MLA attention. - β
print_kv_cache_cont_batch_generic_cpu: - β
print_kv_cache_cont_batch_generic_gpu: - β
print_kv_cache_paged_generic_cpu: - β
print_kv_cache_paged_generic_gpu: - β
rms_norm_kv_cache_ragged_paged: Performs RMSNorm in place on new entries in the key cache. - β
rms_norm_value_cache_ragged_paged: Performs RMSNorm in place on new entries in the value cache.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!