Skip to main content

Mojo package

rdna

RDNA Conv2D via implicit GEMM (fused im2col + WMMA matmul).

High-performance Conv2D for AMD RDNA 3+ GPUs. Fuses im2col coordinate computation into the WMMA matmul kernel's A-tile loader, eliminating the large intermediate im2col buffer.

Supported: Conv2D fprop with stride=1, dilation=1, BF16/FP16.

Modulesโ€‹

Was this page helpful?