Python class
AttentionDispatchMetadata
AttentionDispatchMetadata
class max.nn.kv_cache.AttentionDispatchMetadata(tensor)
Bases: NestedIterableDataclass[_DispatchMetadataT], Generic[_DispatchMetadataT]
Wraps the scalar attention dispatch metadata tensor for a single device.
The wrapped tensor must have dtype int64 and rank 1. It encodes
the four dispatch scalars consumed by ragged decode kernels: batch size,
maximum query sequence length, number of partitions, and maximum cache
valid length.
-
Parameters:
-
tensor (_DispatchMetadataT)
tensor
tensor: _DispatchMetadataT
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!