Python module
tp_cache_manager
PagedAttention-enabled KV cache for the Transformer leveraging the mo.opaque pattern.
PagedCacheInputSymbols
class max.kv_cache.paged_cache.tp_cache_manager.PagedCacheInputSymbols(kv_blocks: 'BufferType', cache_lengths: 'TensorType', lookup_table: 'TensorType', max_lengths: 'TensorType')
-
Parameters:
-
- kv_blocks (BufferType)
- cache_lengths (TensorType)
- lookup_table (TensorType)
- max_lengths (TensorType)
cache_lengths
cache_lengths: TensorType
kv_blocks
kv_blocks: BufferType
lookup_table
lookup_table: TensorType
max_lengths
max_lengths: TensorType
ResetPrefixCacheBackend
class max.kv_cache.paged_cache.tp_cache_manager.ResetPrefixCacheBackend(zmq_endpoint_base)
-
Parameters:
-
zmq_endpoint_base (str)
should_reset_prefix_cache()
should_reset_prefix_cache(blocking=False)
ResetPrefixCacheFrontend
class max.kv_cache.paged_cache.tp_cache_manager.ResetPrefixCacheFrontend(zmq_endpoint_base)
-
Parameters:
-
zmq_endpoint_base (str)
enqueue_reset_prefix_cache()
enqueue_reset_prefix_cache()
-
Return type:
-
None
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!