Python module
attention_without_mask
An opaque KV Cache optimized vanilla attention mechanism, with Mask Variants provided inside the Kernel.
AttentionWithoutMask
class max.pipelines.nn.attention.attention_without_mask.AttentionWithoutMask(n_heads: 'int', kv_params: 'KVCacheParams', layer_idx: 'TensorValue', wqkv: 'TensorValue', wo: 'Linear', mask_variant: max.pipelines.nn.kernels.MHAMaskVariant)
mask_variant
mask_variant*: MHAMaskVariant*
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!