Mojo package
gpu
GPU multi-head attention (MHA), cross-attention, and multi-head latent attention (MLA) kernels. Vendor-specific implementations live under amd/ and nvidia/.
Packages
-
amd: AMD GPU attention kernels for CDNA (GFX942/GFX950) and RDNA architectures. -
amd_structured: Structured AMD GPU attention kernels (TileTensor hot path). -
nvidia: NVIDIA GPU attention kernels and tile-scheduling utilities.
Modules
-
mha: -
mha_cross: -
mha_decode_partition_heuristic: -
mla: -
mla_graph: -
mla_index_fp8: MLA FP8 index kernel for computing attention scores with paged KV cache.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!