Skip to main content

/

Docs

Nightly

v24.6

Nightly build: This page might be unfinished.See the stable version.

Python module

naive_cache

Naive KV cache for the Transformer.

`NaiveKVCacheManager`

class max.pipelines.kv_cache.naive_cache.NaiveKVCacheManager(params: KVCacheParams, max_cache_batch_size: int, max_seq_len: int, num_layers: int, devices: List[Device], session: InferenceSession)

`cache_shape`

property cache_shape*: list[int]*

`estimated_memory_size()`

classmethod estimated_memory_size(params: KVCacheParams, max_cache_batch_size: int, max_seq_len: int, num_layers: int, available_cache_memory: int, devices: List[Device]) → int

Returns the estimated total memory usage of the kv cache.

`input_symbols()`

input_symbols() → List[tuple[max.graph.type.BufferType, max.graph.type.BufferType, max.graph.type.TensorType, max.graph.type.TensorType]]

NaiveKVCacheManager

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!