Mojo module
im2col_matmul_3d
Explicit im2col + _matmul_gpu dispatch for 3D convolution.
Mirrors the AMD RDNA 2D pattern in
max/kernels/src/nn/conv/gpu/amd/rdna/dispatch.mojo, extended to 3D
(NDHWC input, QRSCF or FCQRS filter) and bounded by an M-tile loop so
large video resolutions do not blow the scratch budget.
The generic _matmul_gpu auto-routes to SM100 UMMA on Blackwell for
bf16, so this path gives the native 3D conv access to tensor cores
without touching the TMA im2col descriptor layer.
Functions
-
dispatch_im2col_matmul_conv3d: Try to dispatch a 3-D conv as explicit im2col + generic matmul.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!