Mojo module

mla_decode

MLA (Multi-Latent Attention) decode kernel for gfx950.

Thin wrapper that delegates to mha_decode. Self.mla_mode=True produces MLA-style coords (kv_head_idx=0, q_tile_idx=block_idx.y) via AMDStructuredConfig.

View source

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!