Python package
paged_cache
Paged attention KV cache implementation with support for distributed inference.
This package provides the core implementation of paged KV cache management, including cache managers, transfer engines for distributed settings, and tensor parallelism support.
Modules
cache_manager: Core paged KV cache manager implementation.tp_cache_manager: Tensor parallelism cache manager and input symbols.transfer_engine: KV cache transfer engine for distributed inference.
Classes
PagedKVCacheManager: Manager for paged KV cache with data and tensor parallelism support.PagedCacheInputSymbols: Symbolic inputs for paged KV cache operations.KVTransferEngine: Manages KV cache transfers between devices.KVTransferEngineMetadata: Metadata for transfer engine configuration.TransferReqData: Transfer request data structure.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!