Skip to main content

/

Mojo module

kv_cache

Aliases

`embed_fn_type`

alias embed_fn_type = fn[DType, Int](IndexList[4], SIMD[$0, $1]) capturing -> SIMD[$0, $1]

Functions

generic_flash_attention_kv_cache_padded:
generic_flash_attention_kv_cache_padded_materialized_mask:
generic_fused_qk_rope_bshd_continuous_batch: Performs a fused RoPE projection for Q and K projections.
generic_fused_qkv_matmul_kv_cache_bshd_continuous_batch: Performs a fused QKV matmul. Q outputs are written to the output argument while K and V outputs are written in-place into k_cache and v_cache.
generic_get_continuous_cache:
generic_get_paged_cache:
managed_tensor_slice_to_ndbuffer:
print_kv_cache_cont_batch_generic_cpu:
print_kv_cache_cont_batch_generic_gpu:
print_kv_cache_paged_generic_cpu:
print_kv_cache_paged_generic_gpu:
rms_norm_kv_cache_ragged_continuous_batching: Performs RMSNorm in place on new entries in the key cache.
rms_norm_kv_cache_ragged_paged: Performs RMSNorm in place on new entries in the key cache.

Aliases
- embed_fn_type
Functions

View source

View source

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!