Mojo module
mla_decode_qkv_fp8_layout_g
Native FP8 MLA decode kernel for SM100 (B200) β Layout G fold path.
qkv=fp8 / BM=32 / MMA_M=32 / 1x4 datapath specialisations (softmax, correction, output store, TMA/MMA descriptor sizing) can evolve without disturbing the Layout E kernel.
Activated by the dispatcher constructing
MLA_SM100_Decode_Config(decode_layout_g=True). Key shapes:
BM=32, BN_QK=64, BK_QKT=64, BK_PV=64, BN_PV=256, MMA_M=32,
num_kv_stages=5, cta_group=1.
SMEM layout (BN_QK=64, 5 stages): Q FP8: 32 x 576 x 1 = 18432 B (SWIZZLE_64B) KV stages: 5 x 64 x 576 x 1 = 184320 B (SWIZZLE_64B) P stages: 5 x 32 x 64 x 1 = 10240 B (SWIZZLE_64B) max/li: 128 x 4 x 3 = 1536 B barriers: (6N+11) fixed + output barriers
Structsβ
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!