Python module
tp_cache_manager
PagedAttention-enabled KV cache for the Transformer leveraging the mo.opaque pattern.
PagedCacheInputSymbols
class max.kv_cache.paged_cache.tp_cache_manager.PagedCacheInputSymbols(kv_blocks: 'BufferType', cache_lengths: 'TensorType', lookup_table: 'TensorType', max_lengths: 'TensorType')
-
Parameters:
-
- kv_blocks (BufferType)
- cache_lengths (TensorType)
- lookup_table (TensorType)
- max_lengths (TensorType)
cache_lengths
cache_lengths: TensorType
kv_blocks
kv_blocks: BufferType
lookup_table
lookup_table: TensorType
max_lengths
max_lengths: TensorType
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!