Skip to main content

Mojo module

im2col_matmul_3d

Explicit im2col + _matmul_gpu dispatch for 3D convolution.

Mirrors the AMD RDNA 2D pattern in max/kernels/src/nn/conv/gpu/amd/rdna/dispatch.mojo, extended to 3D (NDHWC input, QRSCF or FCQRS filter) and bounded by an M-tile loop so large video resolutions do not blow the scratch budget.

The generic _matmul_gpu auto-routes to SM100 UMMA on Blackwell for bf16, so this path gives the native 3D conv access to tensor cores without touching the TMA im2col descriptor layer.

Functions

Was this page helpful?