Mojo package

gpu

GPU multi-head attention (MHA), cross-attention, and multi-head latent attention (MLA) kernels. Vendor-specific implementations live under amd/ and nvidia/.

Packages

amd_rdna: TileTensor-native attention kernels for AMD RDNA3+ (gfx11xx/gfx12xx).
amd_structured: TileTensor-native attention kernels for AMD gfx950 (MI355X).
nvidia: NVIDIA GPU attention kernels and tile-scheduling utilities.

Modules

mha:
mha_cross:
mha_decode_partition_heuristic:
mla:
mla_graph:
mla_index_fp8: MLA FP8 index kernel for computing attention scores with paged KV cache.

Packages​

Modules​

Packages

Modules