Python module

tp_cache_manager

PagedAttention-enabled KV cache for the Transformer leveraging the mo.opaque pattern.

`PagedCacheInputSymbols`

class max.kv_cache.paged_cache.tp_cache_manager.PagedCacheInputSymbols(kv_blocks: 'BufferType', cache_lengths: 'TensorType', lookup_table: 'TensorType', max_lengths: 'TensorType')

Parameters:

kv_blocks (BufferType)
cache_lengths (TensorType)
lookup_table (TensorType)
max_lengths (TensorType)

`cache_lengths`

cache_lengths: TensorType

`kv_blocks`

kv_blocks: BufferType

`lookup_table`

lookup_table: TensorType

`max_lengths`

max_lengths: TensorType

`ResetPrefixCacheBackend`

class max.kv_cache.paged_cache.tp_cache_manager.ResetPrefixCacheBackend(zmq_endpoint_base)

Parameters:: zmq_endpoint_base (str)

`should_reset_prefix_cache()`

should_reset_prefix_cache(blocking=False)

Parameters:: blocking (bool)
Return type:: bool

`ResetPrefixCacheFrontend`

class max.kv_cache.paged_cache.tp_cache_manager.ResetPrefixCacheFrontend(zmq_endpoint_base)

Parameters:: zmq_endpoint_base (str)

`enqueue_reset_prefix_cache()`

enqueue_reset_prefix_cache()

Return type:: None

PagedCacheInputSymbols​

cache_lengths​

kv_blocks​

lookup_table​

max_lengths​

ResetPrefixCacheBackend​

should_reset_prefix_cache()​

ResetPrefixCacheFrontend​

enqueue_reset_prefix_cache()​

`PagedCacheInputSymbols`

`cache_lengths`

`kv_blocks`

`lookup_table`

`max_lengths`

`ResetPrefixCacheBackend`

`should_reset_prefix_cache()`

`ResetPrefixCacheFrontend`

`enqueue_reset_prefix_cache()`