Python module

multihead_attention

`MultiheadAttention`

class max.nn.legacy.attention.multihead_attention.MultiheadAttention(num_attention_heads, hidden_size, devices=None, dtype=float32, scale=None, qkv_has_bias=False, o_proj_has_bias=False, stacked_qkv=False)

Multihead attention that handles both single and distributed computation.

Parameters:

num_attention_heads (int)
hidden_size (int)
devices (Sequence[DeviceRef] | None)
dtype (DType)
scale (float | None)
qkv_has_bias (bool)
o_proj_has_bias (bool)
stacked_qkv (bool)

`wqkv`

property wqkv: TensorValue

The concatenation of q, k, and v weight vectors.

`wqkv_bias`

property wqkv_bias: TensorValue | None

The concatenation of q, k, and v bias weight vectors.

MultiheadAttention​

wqkv​

wqkv_bias​

`MultiheadAttention`

`wqkv`

`wqkv_bias`