Mojo package
amd_structured
TileTensor-native attention kernels for AMD gfx950 (MI355X).
This module provides gfx950-only attention implementation using TileTensor throughout. Supports MHA prefill (depth=64, 128, 256, 512), MHA decode (token generation), MLA prefill, and MLA decode.
Modulesβ
- β
attention: Attention struct for gfx950 MHA/MLA kernels (prefill + decode). - β
buffers: Q, P, and Output register buffers for gfx950 attention kernels. - β
config: GFX950 attention config. - β
kv_buffer: KV cache buffers for MHA/MLA prefill and decode kernels. - β
mask_op: Named TileOp struct for QK score masking on gfx950. - β
mha_decode: Gfx950 MHA decode kernel, built on KVBuffer. - β
mha_decode_streaming: MHA streaming decode kernel for gfx950. - β
mha_prefill: Unified gfx950 MHA prefill kernel. - β
mla_decode: MLA (Multi-Latent Attention) decode kernel for gfx950. - β
mla_prefill: MLA (Multi-Latent Attention) prefill kernel for gfx950. - β
mma: - β
softmax: Online softmax for gfx950 attention kernels. - β
utils: Shared helpers for gfx950 attention kernels.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!