Skip to main content

Python module

tp_cache_manager

PagedAttention-enabled KV cache for the Transformer leveraging the mo.opaque pattern.

PagedCacheInputSymbols

class max.kv_cache.paged_cache.tp_cache_manager.PagedCacheInputSymbols(kv_blocks: 'BufferType', cache_lengths: 'TensorType', lookup_table: 'TensorType', max_lengths: 'TensorType')

Parameters:

cache_lengths

cache_lengths: TensorType

kv_blocks

kv_blocks: BufferType

lookup_table

lookup_table: TensorType

max_lengths

max_lengths: TensorType

Was this page helpful?