Mojo module
im2col_matmul_2d
Explicit im2col + _matmul_gpu dispatch for 2D convolution.
Materialises an im2col [M, K] scratch into global memory and calls the
generic _matmul_gpu on it. _matmul_gpu auto-routes to SM100 UMMA on
Blackwell for bf16, giving non-128-aligned-channel 2-D convs access to
tensor cores without the TMA im2col descriptor layer.
- M = batch * H_out * W_out (linearized output pixel)
- K = R * S * C_in (filter-flattened reduction axis)
- N = C_out (output channels)
Gate: bf16, groups=1, dilation=1, kernel > 1×1 (the vectorized naive kernel wins on 1×1), K >= 16 (below MMA_K).
Functions
-
dispatch_im2col_matmul_conv2d: Try to dispatch a 2-D conv as explicit im2col + generic matmul.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!