Mojo module
config
GFX950 attention config.
Supports both prefill (token_gen=False) and decode (token_gen=True).
Matches amd/mha.mojo config target: full_kv=True, depth_padded=False for both. Prefill: double_buffer=True. Decode: double_buffer=False, double_buffer_k_only when BN<=64, shared_kv only at depth>256 (SMEM budget).
Structs
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!