IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

im2col_matmul_2d

Explicit im2col + _matmul_gpu dispatch for 2D convolution.

Materialises an im2col [M, K] scratch into global memory and calls the generic _matmul_gpu on it. _matmul_gpu auto-routes to SM100 UMMA on Blackwell for bf16, giving non-128-aligned-channel 2-D convs access to tensor cores without the TMA im2col descriptor layer.

  • M = batch * H_out * W_out (linearized output pixel)
  • K = R * S * C_in (filter-flattened reduction axis)
  • N = C_out (output channels)

Gate: bf16, groups=1, dilation=1, kernel > 1×1 (the vectorized naive kernel wins on 1×1), K >= 16 (below MMA_K).

Functions