IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

conv2d

Public API for SM100 Conv2D forward propagation.

This module provides the high-level API for launching Conv2D fprop kernels on NVIDIA Blackwell (SM100) GPUs. It handles:

  • TMA descriptor setup for activation (with im2col), filter, and output
  • Kernel configuration selection
  • Kernel launch with proper grid/block dimensions

Usage (4D NHWC API with implicit im2col): from nn.conv.gpu.nvidia.sm100 import conv2d_fprop

var problem = Conv2dProblemShape(
    batch=1,
    in_height=256, in_width=256, in_channels=64,
    out_channels=128,
    filter_h=3, filter_w=3,
    pad_h=1, pad_w=1,
)
conv2d_fprop(output, input, filter, problem, ctx)

Note: This implementation currently supports:

  • stride=1, dilation=1
  • NHWC layout for activation and output
  • KRSC layout for filter
  • BF16/FP16 data types

Functions​