Python module
naive_cache
Naive KV cache for the Transformer.
NaiveKVCacheInputSymbols
class max.pipelines.kv_cache.naive_cache.NaiveKVCacheInputSymbols(k_cache: max.graph.type.BufferType, v_cache: max.graph.type.BufferType, start_pos: max.graph.type.TensorType, null_op: max.graph.type.TensorType)
k_cache
k_cache*: BufferType*
null_op
null_op*: TensorType*
start_pos
start_pos*: TensorType*
v_cache
v_cache*: BufferType*
NaiveKVCacheManager
class max.pipelines.kv_cache.naive_cache.NaiveKVCacheManager(params: KVCacheParams, max_batch_size: int, max_seq_len: int, num_layers: int, devices: List[Device], session: InferenceSession)
cache_shape
estimated_memory_size()
classmethod estimated_memory_size(params: KVCacheParams, max_batch_size: int, max_seq_len: int, num_layers: int, available_cache_memory: int, devices: List[Device], **kwargs: Any) → int
Returns the estimated total memory usage of the kv cache.
infer_optimal_batch_size()
classmethod infer_optimal_batch_size(params: KVCacheParams, max_seq_len: int, num_layers: int, available_cache_memory: int, devices: List[Device], **kwargs: Any) → int
Returns the estimated optimal batch size for the kv cache.
input_symbols()
input_symbols() → List[NaiveKVCacheInputSymbols]
Returns the input symbols for the kv cache manager.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!