IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

conv2d_fprop_kernel

SM100 Conv2D Forward Propagation Kernel.

This module implements a high-performance Conv2D fprop kernel for NVIDIA Blackwell (SM100) GPUs using the Structured Kernel architecture.

The kernel uses implicit GEMM to compute convolution:

  • Maps conv to GEMM: C[M,N] = A[M,K] @ B[K,N]
  • M = batch * out_h * out_w (output spatial)
  • N = out_channels (filters)
  • K = in_channels * filter_h * filter_w (reduction)

The implementation reuses matmul infrastructure:

  • 8-warp specialization (scheduler, load, MMA, epilogue load, epilogue)
  • TMA-based tile loading with im2col addressing
  • TMEM accumulators
  • Producer-consumer pipelining

Supported configurations (Flux VAE optimized):

  • stride=1, dilation=1 (most common in VAE decoder)
  • 3x3 and 1x1 kernels
  • BF16/FP16 data types

Structs