Skip to main content

Python package

paged_cache

Paged attention KV cache implementation with support for distributed inference.

This package provides the core implementation of paged KV cache management, including cache managers, transfer engines for distributed settings, and tensor parallelism support.

Modules

Classes

Was this page helpful?