Skip to main content

Mojo module

im2col_matmul_3d

Explicit im2col + _matmul_gpu dispatch for 3D convolution.

Mirrors the AMD RDNA 2D pattern in max/kernels/src/nn/conv/gpu/amd/rdna/dispatch.mojo, extended to 3D (NDHWC input, QRSCF or FCQRS filter) and bounded by an M-tile loop so large video resolutions do not blow the scratch budget.

The generic _matmul_gpu auto-routes to SM100 UMMA on Blackwell for bf16, so this path gives the native 3D conv access to tensor cores without touching the TMA im2col descriptor layer.

Functions