Mojo struct
Conv2dConfig
struct Conv2dConfig[act_type: DType, filter_type: DType, out_type: DType]
Configuration for SM100 Conv2D kernel.
This mirrors MatmulConfig but with conv-specific semantics.
Parameters
- act_type (
DType): Activation (input) data type. - filter_type (
DType): Filter (weight) data type. - out_type (
DType): Output data type.
Fields
- block_tile_shape (
IndexList[3]): - mma_shape (
IndexList[3]): - output_tile_shape (
IndexList[2]): - num_pipeline_stages (
Int): - num_output_stages (
Int): - num_accum_pipeline_stages (
Int): - num_clc_pipeline_stages (
Int): - k_group_size (
Int): - cluster_shape (
IndexList[3]): - cta_group (
Int): - a_swizzle (
TensorMapSwizzle): - b_swizzle (
TensorMapSwizzle): - c_swizzle (
TensorMapSwizzle): - block_swizzle_size (
Int):
Implemented traits
AnyType,
Copyable,
ImplicitlyDestructible,
Movable
comptime members
__copyinit__is_trivial
comptime __copyinit__is_trivial = True
__del__is_trivial
comptime __del__is_trivial = True
__moveinit__is_trivial
comptime __moveinit__is_trivial = True
Methods
accum_type
default_bf16
static default_bf16() -> Self
Default configuration for BF16 conv2d (VAE-optimized).
Uses 2-SM cluster mode (cta_group=2) with 128×128 block tiles, matching the standard SM100 matmul configuration pattern.
For cta_group=2 with MMA_M=256, MMA_N=256:
- block_tile_shape = mma_shape // cta_group = (128, 128, 64)
- output_tile_shape = (128, 32) - each output tile is 128 rows × 32 cols
- cluster_shape[0] = 2 (2 CTAs in M dimension)
Pipeline stages are dynamically computed to maximize SMEM utilization.
default_bf16_1sm
static default_bf16_1sm() -> Self
Default configuration for BF16 conv2d using 1-SM mode.
Uses 1-SM mode (cta_group=1) with 128×128 block tiles, matching the CUTLASS example configuration.
For cta_group=1 with MMA_M=128, MMA_N=128, MMA_K=16:
- block_tile_shape = (128, 128, 64) for tile sizes
- mma_shape = (128, 128, 16) for MMA instruction shape
- output_tile_shape = (128, 32) with c_swizzle=SWIZZLE_64B
- cluster_shape = (1, 1, 1) (single CTA per cluster)
Pipeline stages are dynamically computed to maximize SMEM utilization.
default_fp16
static default_fp16() -> Self
Default configuration for FP16 conv2d.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!