For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo package
amd
Provides the AMD GPU backend implementations for matmuls.
Modulesβ
- β
amd_4wave_matmul: 4-wave matmul for AMD MI355X (CDNA4). - β
amd_4wave_schedule: Inline 4-wave schedule for AMD GPU matmul / implicit-GEMM conv kernels. - β
amd_4wave_split_k_matmul: Single-launch split-K wrapper for the 4-wave FP8 matmul. - β
amd_matmul: Pure TileTensor structured AMD matmul kernel. - β
amd_matmul_schedule: Declarative software pipeline schedule for the default AMD matmul kernel. - β
amd_ping_pong_matmul: Structured ping-pong matmul for AMD MI355X (CDNA4). - β
amd_ping_pong_schedule: Ping-pong schedule for AMD GPU matmul kernels. - β
amd_target: AMD GPU target definitions for the pipeline scheduling framework. - β
matmul_mma: MMA operators for AMD matmul kernels. - β
mxfp4_dequant_matmul_amd: MXFP4 matmul on AMD CDNA GPUs via dequant-to-FP8 + FP8 GEMM. - β
mxfp4_grouped_matmul_amd: - β
mxfp4_matmul_amd: Native MXFP4 block-scaled matmul on AMD CDNA4 via f8f6f4 MFMA. - β
mxfp4_matmul_amd_preb: MXFP4 block-scaled matmul on AMD CDNA4 with preshuffled B + scales + direct VGPR loads. - β
mxfp4_moe_matmul_amd: MXFP4 x MXFP4 routed MoE matmul kernel for AMD CDNA4. - β
mxfp4_preshuffle_layouts: Host-side MXFP4 preshuffle layouts for AMD CDNA4 grouped MoE matmul. - β
mxfp4_preshuffle_loaders: Per-lane DRAM->VGPR loaders for the preshuffled MXFP4 MoE matmul. - β
pipeline_body: Builder for declarative pipeline body specifications. - β
ring_buffer: Ring Buffer implementation for producer-consumer synchronization in GPU kernels. - β
ring_buffer_traits: Trait definitions and utilities for ring buffer synchronization strategies. - β
structured: - β
warp_spec_matmul: AMD Warp-Specialized Matrix Multiplication.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!