For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
KVCacheInputsPerDevice
KVCacheInputsPerDeviceβ
class max.nn.kv_cache.KVCacheInputsPerDevice(kv_blocks, cache_lengths, lookup_table, max_lengths, kv_scales=None, attention_dispatch_metadata=None, draft_attention_dispatch_metadata=None, mla_num_partitions=None, draft_mla_num_partitions=None)
Bases: Generic[_Tensor, _Buffer]
Symbolic graph input types for a single deviceβs paged KV cache.
-
Parameters:
-
- kv_blocks (_Buffer)
- cache_lengths (_Tensor)
- lookup_table (_Tensor)
- max_lengths (_Tensor)
- kv_scales (_Buffer | None)
- attention_dispatch_metadata (_Tensor | None)
- draft_attention_dispatch_metadata (_Tensor | None)
- mla_num_partitions (_Tensor | None)
- draft_mla_num_partitions (_Tensor | None)
attention_dispatch_metadataβ
attention_dispatch_metadata: _Tensor | None = None
cache_lengthsβ
cache_lengths: _Tensor
draft_attention_dispatch_metadataβ
draft_attention_dispatch_metadata: _Tensor | None = None
draft_mla_num_partitionsβ
draft_mla_num_partitions: _Tensor | None = None
flatten()β
flatten()
Serialize fields into a flat list for graph input binding.
Ordering: [kv_blocks, cache_lengths, lookup_table, max_lengths,
kv_scales?, attention_dispatch_metadata?,
draft_attention_dispatch_metadata?, mla_num_partitions?,
draft_mla_num_partitions?]. Fields marked ? emit zero elements
when None; unflatten must consume next(it) in this exact
order.
-
Return type:
-
list[_Tensor | _Buffer]
flatten_without_attention_dispatch_metadata()β
flatten_without_attention_dispatch_metadata()
-
Return type:
-
list[_Tensor | _Buffer]
kv_blocksβ
kv_blocks: _Buffer
kv_scalesβ
kv_scales: _Buffer | None = None
lookup_tableβ
lookup_table: _Tensor
max_lengthsβ
max_lengths: _Tensor
mla_num_partitionsβ
mla_num_partitions: _Tensor | None = None
unflatten()β
unflatten(it)
Reconstruct from a flat iterator produced by flatten.
Consumes next(it) in the same order flatten emits elements;
the two methods must stay in lock-step.
-
Parameters:
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!