Mojo module
kv_cache
Aliases
-
embed_fn_type = fn[DType, Int](Index[4], SIMD[$0, $1]) capturing -> SIMD[$0, $1]
:
Functions
-
generic_flash_attention_kv_cache_causal_alibi_mask_continuous_batch
: -
generic_flash_attention_kv_cache_causal_mask_continuous_batch
: -
generic_flash_attention_kv_cache_continuous_batch
: -
generic_fused_qk_rope_bshd_continuous_batch
: Performs a fused RoPE projection for Q and K projections. -
generic_fused_qkv_matmul_kv_cache_bshd_continuous_batch
: Performs a fused QKV matmul. Q outputs are written to the output argument while K and V outputs are written in-place into k_cache and v_cache. -
generic_get_continuous_cache
: -
generic_get_paged_cache
: -
print_kv_cache_cont_batch_generic_cpu
: -
print_kv_cache_cont_batch_generic_gpu
: -
print_kv_cache_paged_generic_cpu
: -
print_kv_cache_paged_generic_gpu
: -
rms_norm_kv_cache_ragged_continuous_batching
: Performs RMSNorm in place on new entries in the key cache. -
rms_norm_kv_cache_ragged_paged
: Performs RMSNorm in place on new entries in the key cache.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!