Mojo package
amd
Provides the AMD GPU backend implementations for matmuls.
Modules
-
amd_matmul: Pure TileTensor structured AMD matmul kernel. -
amd_matmul_schedule: Declarative software pipeline schedule for the default AMD matmul kernel. -
amd_ping_pong_matmul: Structured ping-pong matmul for AMD MI355X (CDNA4). -
amd_ping_pong_schedule: Ping-pong schedule for AMD GPU matmul kernels. -
amd_target: AMD GPU target definitions for the pipeline scheduling framework. -
matmul_mma: MMA and data-movement helpers for AMD matmul kernels. -
mxfp4_dequant_matmul_amd: MXFP4 matmul on AMD CDNA GPUs via dequant-to-FP8 + FP8 GEMM. -
mxfp4_grouped_matmul_amd: MXFP4 grouped matmul on AMD CDNA GPUs via dequant-to-FP8 + FP8 grouped GEMM. -
mxfp4_matmul_amd: Native MXFP4 block-scaled matmul on AMD CDNA4 via f8f6f4 MFMA. -
pipeline_body: Builder for declarative pipeline body specifications. -
ring_buffer: Ring Buffer implementation for producer-consumer synchronization in GPU kernels. -
ring_buffer_traits: Trait definitions and utilities for ring buffer synchronization strategies. -
structured: -
warp_spec_matmul: AMD Warp-Specialized Matrix Multiplication.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!