Python class
KVCacheInputsPerDevice
KVCacheInputsPerDevice
class max.nn.kv_cache.KVCacheInputsPerDevice(kv_blocks, cache_lengths, lookup_table, max_lengths, kv_scales=None, attention_dispatch_metadata=None, draft_attention_dispatch_metadata=None)
Bases: Generic[_Tensor, _Buffer]
Symbolic graph input types for a single device’s paged KV cache.
-
Parameters:
-
- kv_blocks (_Buffer)
- cache_lengths (_Tensor)
- lookup_table (_Tensor)
- max_lengths (_Tensor)
- kv_scales (_Buffer | None)
- attention_dispatch_metadata (_Tensor | None)
- draft_attention_dispatch_metadata (_Tensor | None)
attention_dispatch_metadata
attention_dispatch_metadata: _Tensor | None = None
cache_lengths
cache_lengths: _Tensor
draft_attention_dispatch_metadata
draft_attention_dispatch_metadata: _Tensor | None = None
flatten()
flatten()
-
Return type:
-
list[_Tensor | _Buffer]
flatten_without_attention_dispatch_metadata()
flatten_without_attention_dispatch_metadata()
-
Return type:
-
list[_Tensor | _Buffer]
kv_blocks
kv_blocks: _Buffer
kv_scales
kv_scales: _Buffer | None = None
lookup_table
lookup_table: _Tensor
max_lengths
max_lengths: _Tensor
unflatten()
unflatten(it)
-
Parameters:
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!