Python module

interfaces

General interface for Attention.

`AttentionImplQKV`

class max.nn.attention.interfaces.AttentionImplQKV(n_heads, kv_params, wq, wk, wv, wo, scale)

A generalized attention interface, that will be used upstream by a general Transformer. We would expect a separate subclass, articulating each variation of Attention:

AttentionWithRope
AttentionWithAlibi
VanillaAttentionWithCausalMask
…

There are a series of shared attributes, however, more may be needed for each individual variant. For example, we may introduce a RotaryEmbedding class for the AttentionWithRope class:

@dataclass
class AttentionWithRope(AttentionImplQKV):
    rope: RotaryEmbedding
    ...

We expect the __call__ abstractmethod to remain relatively consistent, however the **kwargs argument is exposed, allowing you to leverage additional arguments for each particular variant. For example, we may introduce an VanillaAttentionWithCausalMask class, which includes an attention mask:

@dataclass
class VanillaAttentionWithCausalMask(AttentionImplQKV):
    ...

    def __call__(
        self,
        x: TensorValueLike,
        kv_collection: PagedCacheValues,
        valid_lengths: TensorValueLike,
        **kwargs,
    ) -> tuple[TensorValue, PagedCacheValues]: ...

        if "attn_mask" not in kwargs:
            raise ValueError("attn_mask not provided to VanillaAttentionWithCausalMask")

        # Which we can then use the attention mask downstream like so:
        op(
            attn_mask = kwargs["attn_mask"]
        )

Parameters:

n_heads (int)
kv_params (KVCacheParams)
wq (Value[TensorType] | TensorValue | Shape | Dim | HasTensorValue | int | float | integer[Any] | floating[Any] | DLPackArray)
wk (Value[TensorType] | TensorValue | Shape | Dim | HasTensorValue | int | float | integer[Any] | floating[Any] | DLPackArray)
wv (Value[TensorType] | TensorValue | Shape | Dim | HasTensorValue | int | float | integer[Any] | floating[Any] | DLPackArray)
wo (LinearV1)
scale (float)

`kv_params`

kv_params: KVCacheParams

KV Cache Params, including the number of kv heads, the head dim, and data type.

`n_heads`

n_heads: int

The number of attention heads.

`scale`

scale: float

The scale factor for the attention.

`wk`

wk: Value[TensorType] | TensorValue | Shape | Dim | HasTensorValue | int | float | integer[Any] | floating[Any] | DLPackArray

The k weight vector.

`wo`

wo: LinearV1

A linear layer for the output projection.

`wq`

wq: Value[TensorType] | TensorValue | Shape | Dim | HasTensorValue | int | float | integer[Any] | floating[Any] | DLPackArray

The q weight vector.

`wv`

wv: Value[TensorType] | TensorValue | Shape | Dim | HasTensorValue | int | float | integer[Any] | floating[Any] | DLPackArray

The v weight vector.

`DistributedAttentionImpl`

class max.nn.attention.interfaces.DistributedAttentionImpl

A generalized Distributed attention interface.

AttentionImplQKV​

kv_params​

n_heads​

scale​

wk​

wo​

wq​

wv​

DistributedAttentionImpl​