Skip to main content

/

Mojo package

attention

Multi-head attention (MHA) and multi-head latent attention (MLA) kernels.

Shared utilities live here (masks, operands, config), with platform-specific implementations under cpu/ and gpu/.

Packages

cpu: CPU flash-attention implementation for multi-head attention.
gpu: GPU multi-head attention (MHA), cross-attention, and multi-head latent attention (MLA) kernels. Vendor-specific implementations live under amd/ and nvidia/.

Modules

Packages
Modules

View source

View source

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!