Mojo module
mxfp4_grouped_matmul_amd
Native MXFP4 grouped matmul on AMD CDNA4 via block-scaled MFMA.
Grouped matmul for Mixture of Experts (MoE): for i in range(num_active_experts): C[offsets[i]:offsets[i+1], :] = A[offsets[i]:offsets[i+1], :] @ B[expert_ids[i], :, :].T
Uses block_idx.z for expert dispatch and MXFP4MatmulAMD.run per-expert.
Entry point: mxfp4_grouped_matmul_amd()
Functionsโ
- โ
mxfp4_grouped_matmul_amd: Launch native MXFP4 grouped matmul on AMD CDNA4. - โ
mxfp4_grouped_matmul_amd_kernel: MXFP4 grouped matmul kernel with expert dispatch via block_idx.z.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!