Skip to main content

Python class

AttentionDispatchMetadata

AttentionDispatchMetadata

class max.nn.kv_cache.AttentionDispatchMetadata(tensor)

source

Bases: NestedIterableDataclass[_DispatchMetadataT], Generic[_DispatchMetadataT]

Wraps the scalar attention dispatch metadata tensor for a single device.

The wrapped tensor must have dtype int64 and rank 1. It encodes the four dispatch scalars consumed by ragged decode kernels: batch size, maximum query sequence length, number of partitions, and maximum cache valid length.

Parameters:

tensor (_DispatchMetadataT)

tensor

tensor: _DispatchMetadataT

source