Skip to main content
Log in

Python module

naive_cache

Naive KV cache for the Transformer.

NaiveKVCacheInputSymbols

class max.pipelines.kv_cache.naive_cache.NaiveKVCacheInputSymbols(k_cache: max.graph.type.BufferType, v_cache: max.graph.type.BufferType, start_pos: max.graph.type.TensorType, null_op: max.graph.type.TensorType)

k_cache

k_cache*: BufferType*

null_op

null_op*: TensorType*

start_pos

start_pos*: TensorType*

v_cache

v_cache*: BufferType*

NaiveKVCacheManager

class max.pipelines.kv_cache.naive_cache.NaiveKVCacheManager(params: KVCacheParams, max_batch_size: int, max_seq_len: int, num_layers: int, devices: List[Device], session: InferenceSession)

cache_shape

property cache_shape*: list[int]*

estimated_memory_size()

classmethod estimated_memory_size(params: KVCacheParams, max_batch_size: int, max_seq_len: int, num_layers: int, available_cache_memory: int, devices: List[Device], **kwargs: Any) → int

Returns the estimated total memory usage of the kv cache.

infer_optimal_batch_size()

classmethod infer_optimal_batch_size(params: KVCacheParams, max_seq_len: int, num_layers: int, available_cache_memory: int, devices: List[Device], **kwargs: Any) → int

Returns the estimated optimal batch size for the kv cache.

input_symbols()

input_symbols() → List[NaiveKVCacheInputSymbols]

Returns the input symbols for the kv cache manager.