Skip to main content

Python package

kv_cache

KV cache management for efficient attention computation during inference.

This package provides implementations for managing key-value caches used in transformer models. The paged attention implementation enables efficient memory management by fragmenting cache memory into pages, allowing for better memory utilization and support for prefix caching.

Functions​

Modules​

Packages​

  • paged_cache: Paged attention KV cache implementation.

Classes​

Was this page helpful?