Skip to main content

Mojo module

mla_decode_qkv_fp8_layout_g

Native FP8 MLA decode kernel for SM100 (B200) β€” Layout G fold path.

qkv=fp8 / BM=32 / MMA_M=32 / 1x4 datapath specialisations (softmax, correction, output store, TMA/MMA descriptor sizing) can evolve without disturbing the Layout E kernel.

Activated by the dispatcher constructing MLA_SM100_Decode_Config(decode_layout_g=True). Key shapes: BM=32, BN_QK=64, BK_QKT=64, BK_PV=64, BN_PV=256, MMA_M=32, num_kv_stages=5, cta_group=1.

SMEM layout (BN_QK=64, 5 stages): Q FP8: 32 x 576 x 1 = 18432 B (SWIZZLE_64B) KV stages: 5 x 64 x 576 x 1 = 184320 B (SWIZZLE_64B) P stages: 5 x 32 x 64 x 1 = 10240 B (SWIZZLE_64B) max/li: 128 x 4 x 3 = 1536 B barriers: (6N+11) fixed + output barriers

Structs​