Mojo module
mla_decode
MLA (Multi-Latent Attention) decode kernel for gfx950.
Thin wrapper that delegates to mha_decode. Self.mla_mode=True
produces MLA-style coords (kv_head_idx=0, q_tile_idx=block_idx.y) via
AMDStructuredConfig.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!