Mojo package
amd
Provides the AMD GPU backend implementations for matmuls.
Modulesβ
- β
amd_4wave_matmul: 4-wave FP8 matmul for AMD MI355X (CDNA4). - β
amd_4wave_schedule: Inline 4-wave schedule for AMD GPU FP8 matmul kernels. - β
amd_4wave_split_k_matmul: Single-launch split-K wrapper for the 4-wave FP8 matmul. - β
amd_matmul: Pure TileTensor structured AMD matmul kernel. - β
amd_matmul_schedule: Declarative software pipeline schedule for the default AMD matmul kernel. - β
amd_ping_pong_matmul: Structured ping-pong matmul for AMD MI355X (CDNA4). - β
amd_ping_pong_schedule: Ping-pong schedule for AMD GPU matmul kernels. - β
amd_target: AMD GPU target definitions for the pipeline scheduling framework. - β
matmul_mma: MMA operators for AMD matmul kernels. - β
mxfp4_dequant_grouped_matmul_amd: MXFP4 grouped matmul on AMD CDNA GPUs via dequant-to-FP8 + FP8 grouped GEMM. - β
mxfp4_dequant_matmul_amd: MXFP4 matmul on AMD CDNA GPUs via dequant-to-FP8 + FP8 GEMM. - β
mxfp4_grouped_matmul_amd: Native MXFP4 grouped matmul on AMD CDNA4 via block-scaled MFMA. - β
mxfp4_matmul_amd: Native MXFP4 block-scaled matmul on AMD CDNA4 via f8f6f4 MFMA. - β
pipeline_body: Builder for declarative pipeline body specifications. - β
ring_buffer: Ring Buffer implementation for producer-consumer synchronization in GPU kernels. - β
ring_buffer_traits: Trait definitions and utilities for ring buffer synchronization strategies. - β
structured: - β
warp_spec_matmul: AMD Warp-Specialized Matrix Multiplication.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!