Mojo module
conv2d
Public API for SM100 Conv2D forward propagation.
This module provides the high-level API for launching Conv2D fprop kernels on NVIDIA Blackwell (SM100) GPUs. It handles:
- TMA descriptor setup for activation (with im2col), filter, and output
- Kernel configuration selection
- Kernel launch with proper grid/block dimensions
Usage (4D NHWC API with implicit im2col): from nn.conv_sm100 import conv2d_fprop
var problem = Conv2dProblemShape(
batch=1,
in_height=256, in_width=256, in_channels=64,
out_channels=128,
filter_h=3, filter_w=3,
pad_h=1, pad_w=1,
)
conv2d_fprop(output, input, filter, problem, ctx)Note: This implementation currently supports:
- stride=1, dilation=1
- NHWC layout for activation and output
- KRSC layout for filter
- BF16/FP16 data types
Functionsβ
- β
conv2d_fprop: Launch Conv2D forward propagation with 4D NHWC API and implicit im2col. - β
conv2d_fprop_with_residual: Launch Conv2D fprop with residual add. - β
im2col: Explicit im2col transformation for convolution.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!