For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
KVCacheInputsPerDevice
KVCacheInputsPerDeviceβ
class max.nn.kv_cache.KVCacheInputsPerDevice(kv_blocks, cache_lengths, lookup_table, max_lengths, kv_scales=None, attention_dispatch_metadata=None, draft_attention_dispatch_metadata=None)
Bases: Generic[_Tensor, _Buffer]
Symbolic graph input types for a single deviceβs paged KV cache.
-
Parameters:
-
- kv_blocks (_Buffer)
- cache_lengths (_Tensor)
- lookup_table (_Tensor)
- max_lengths (_Tensor)
- kv_scales (_Buffer | None)
- attention_dispatch_metadata (_Tensor | None)
- draft_attention_dispatch_metadata (_Tensor | None)
attention_dispatch_metadataβ
attention_dispatch_metadata: _Tensor | None = None
cache_lengthsβ
cache_lengths: _Tensor
draft_attention_dispatch_metadataβ
draft_attention_dispatch_metadata: _Tensor | None = None
flatten()β
flatten()
-
Return type:
-
list[_Tensor | _Buffer]
flatten_without_attention_dispatch_metadata()β
flatten_without_attention_dispatch_metadata()
-
Return type:
-
list[_Tensor | _Buffer]
kv_blocksβ
kv_blocks: _Buffer
kv_scalesβ
kv_scales: _Buffer | None = None
lookup_tableβ
lookup_table: _Tensor
max_lengthsβ
max_lengths: _Tensor
unflatten()β
unflatten(it)
-
Parameters:
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!