Skip to main content

Python module

tp_cache_manager

PagedAttention-enabled KV cache for the Transformer leveraging the mo.opaque pattern.

PagedCacheInputSymbols

class max.kv_cache.paged_cache.tp_cache_manager.PagedCacheInputSymbols(kv_blocks: 'BufferType', cache_lengths: 'TensorType', lookup_table: 'TensorType', max_lengths: 'TensorType')

Parameters:

cache_lengths

cache_lengths: TensorType

kv_blocks

kv_blocks: BufferType

lookup_table

lookup_table: TensorType

max_lengths

max_lengths: TensorType

ResetPrefixCacheBackend

class max.kv_cache.paged_cache.tp_cache_manager.ResetPrefixCacheBackend(zmq_endpoint_base)

Parameters:

zmq_endpoint_base (str)

should_reset_prefix_cache()

should_reset_prefix_cache(blocking=False)

Parameters:

blocking (bool)

Return type:

bool

ResetPrefixCacheFrontend

class max.kv_cache.paged_cache.tp_cache_manager.ResetPrefixCacheFrontend(zmq_endpoint_base)

Parameters:

zmq_endpoint_base (str)

enqueue_reset_prefix_cache()

enqueue_reset_prefix_cache()

Return type:

None

Was this page helpful?