Skip to main content

Mojo module

conv2d

Public API for SM100 Conv2D forward propagation.

This module provides the high-level API for launching Conv2D fprop kernels on NVIDIA Blackwell (SM100) GPUs. It handles:

  • TMA descriptor setup for activation (with im2col), filter, and output
  • Kernel configuration selection
  • Kernel launch with proper grid/block dimensions

Usage (4D NHWC API with implicit im2col): from nn.conv_sm100 import conv2d_fprop

var problem = Conv2dProblemShape(
    batch=1,
    in_height=256, in_width=256, in_channels=64,
    out_channels=128,
    filter_h=3, filter_w=3,
    pad_h=1, pad_w=1,
)
conv2d_fprop(output, input, filter, problem, ctx)

Note: This implementation currently supports:

  • stride=1, dilation=1
  • NHWC layout for activation and output
  • KRSC layout for filter
  • BF16/FP16 data types

Functions​

Was this page helpful?