Mojo package
gpu
GPU multi-head attention (MHA), cross-attention, and multi-head latent attention (MLA) kernels. Vendor-specific implementations live under amd/ and nvidia/.
Packagesβ
- β
amd_rdna: TileTensor-native attention kernels for AMD RDNA3+ (gfx11xx/gfx12xx). - β
amd_structured: TileTensor-native attention kernels for AMD gfx950 (MI355X). - β
nvidia: NVIDIA GPU attention kernels and tile-scheduling utilities.
Modulesβ
- β
mha: - β
mha_cross: - β
mha_decode_partition_heuristic: - β
mla: - β
mla_graph: - β
mla_index_fp8: MLA FP8 index kernel for computing attention scores with paged KV cache.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!