Skip to main content

Mojo module

mla_decode

MLA (Multi-Latent Attention) decode kernel for gfx950.

Thin wrapper that delegates to mha_decode. Self.mla_mode=True produces MLA-style coords (kv_head_idx=0, q_tile_idx=block_idx.y) via AMDStructuredConfig.

Was this page helpful?