Mojo package
rdna
RDNA Conv2D via implicit GEMM (fused im2col + WMMA matmul).
High-performance Conv2D for AMD RDNA 3+ GPUs. Fuses im2col coordinate computation into the WMMA matmul kernel's A-tile loader, eliminating the large intermediate im2col buffer.
Supported: Conv2D fprop with stride=1, dilation=1, BF16/FP16.
Modulesโ
- โ
conv2d_kernel: RDNA Conv2D implicit GEMM kernel with WMMA. - โ
dispatch: RDNA dispatch for 2-D convolution.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!