Mojo package

amd_structured

TileTensor-native attention kernels for AMD gfx950 (MI355X).

This module provides gfx950-only attention implementation using TileTensor throughout. Supports MHA prefill (depth=64, 128, 256, 512), MHA decode (token generation), MLA prefill, and MLA decode.

Modules

attention: Attention struct for gfx950 MHA/MLA kernels (prefill + decode).
buffers: Q, P, and Output register buffers for gfx950 attention kernels.
config: GFX950 attention config.
kv_buffer: KV cache buffers for MHA/MLA prefill and decode kernels.
mask_op: Named TileOp struct for QK score masking on gfx950.
mha_decode: Gfx950 MHA decode kernel, built on KVBuffer.
mha_decode_streaming: MHA streaming decode kernel for gfx950.
mha_prefill: Unified gfx950 MHA prefill kernel.
mla_decode: MLA (Multi-Latent Attention) decode kernel for gfx950.
mla_prefill: MLA (Multi-Latent Attention) prefill kernel for gfx950.
mma:
softmax: Online softmax for gfx950 attention kernels.
utils: Shared helpers for gfx950 attention kernels.

Modules​

Modules