Mojo package
sm100
SM100 Structured Convolution Kernels.
High-performance Conv2D for NVIDIA Blackwell (SM100) GPUs using implicit GEMM with hardware TMA im2col. Reuses infrastructure from sm100_structured matmul.
Supported: Conv2D fprop with stride=1, dilation=1, BF16/FP16.
Modulesβ
- β
conv2d: Public API for SM100 Conv2D forward propagation. - β
conv2d_fprop_kernel: SM100 Conv2D Forward Propagation Kernel. - β
conv_config: Configuration types for SM100 structured convolution kernels. - β
conv_smem: Shared memory layout for SM100 Conv2D kernel. - β
conv_tile_loader: Tile loader for SM100 convolution with hardware im2col TMA. - β
dispatch: SM100 Conv2D dispatch for the nn conv op. - β
epilogue_load_pipeline: Epilogue load pipeline types for SM100 Conv2D kernel. - β
qslice_conv3d: Q-slice 3-D conv β Q sequential SM100 2-D conv calls with fp32 accumulator.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!