Mojo package
conv_sm100
SM100 Structured Convolution Kernels.
High-performance Conv2D for NVIDIA Blackwell (SM100) GPUs using implicit GEMM with hardware TMA im2col. Reuses infrastructure from sm100_structured matmul.
Supported: Conv2D fprop with stride=1, dilation=1, BF16/FP16.
Modules
-
conv2d: Public API for SM100 Conv2D forward propagation. -
conv2d_fprop_kernel: SM100 Conv2D Forward Propagation Kernel. -
conv_config: Configuration types for SM100 structured convolution kernels. -
conv_smem: Shared memory layout for SM100 Conv2D kernel. -
conv_tile_loader: Tile loader for SM100 convolution with hardware im2col TMA. -
dispatch: SM100 Conv2D dispatch for the nn conv op. -
epilogue_load_pipeline: Epilogue load pipeline types for SM100 Conv2D kernel.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!