Mojo struct
Conv2dConfig
struct Conv2dConfig[act_type: DType, filter_type: DType, out_type: DType]
Configuration for SM100 Conv2D kernel.
This mirrors MatmulConfig but with conv-specific semantics.
Parametersβ
- βact_type (
DType): Activation (input) data type. - βfilter_type (
DType): Filter (weight) data type. - βout_type (
DType): Output data type.
Fieldsβ
- βblock_tile_shape (
IndexList[3]): - βmma_shape (
IndexList[3]): - βoutput_tile_shape (
IndexList[2]): - βnum_pipeline_stages (
Int): - βnum_output_stages (
Int): - βnum_accum_pipeline_stages (
Int): - βnum_clc_pipeline_stages (
Int): - βk_group_size (
Int): - βcluster_shape (
IndexList[3]): - βcta_group (
Int): - βa_swizzle (
TensorMapSwizzle): - βb_swizzle (
TensorMapSwizzle): - βc_swizzle (
TensorMapSwizzle): - βblock_swizzle_size (
Int):
Implemented traitsβ
AnyType,
Copyable,
ImplicitlyDestructible,
Movable
Methodsβ
accum_typeβ
default_bf16β
static default_bf16[swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B]() -> Self
Default configuration for BF16 conv2d (VAE-optimized).
Uses 2-SM cluster mode (cta_group=2) with 128Γ128 block tiles, matching the standard SM100 matmul configuration pattern.
For cta_group=2 with MMA_M=256, MMA_N=256:
- block_tile_shape = mma_shape // cta_group = (128, 128, BK)
- output_tile_shape = (128, 32) - each output tile is 128 rows Γ 32 cols
- cluster_shape[0] = 2 (2 CTAs in M dimension)
BK is chosen to match the activation/filter swizzle so that the TMA descriptor's inner dim (channels_per_pixel = swizzle_bytes/sizeof) divides BK. This keeps K = CRS a whole multiple of BK whenever the dispatch guarantees C*sizeof(act_type) is a multiple of swizzle_bytes.
Pipeline stages are dynamically computed to maximize SMEM utilization.
Parameters:
- βswizzle (
TensorMapSwizzle): Activation/filter swizzle mode. Defaults to SWIZZLE_128B (inner row = 128 bytes). Use SWIZZLE_64B for C_in where C*sizeof(dtype) is 64B-aligned but not 128B-aligned (e.g. bf16 C_in=96).
default_bf16_1smβ
static default_bf16_1sm[swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, num_pipeline_stages_override: Int = 0]() -> Self
Default configuration for BF16 conv2d using 1-SM mode.
Uses 1-SM mode (cta_group=1) with 128Γ128 block tiles, matching the CUTLASS example configuration.
For cta_group=1 with MMA_M=128, MMA_N=128, MMA_K=16:
- block_tile_shape = (128, 128, BK) for tile sizes
- mma_shape = (128, 128, 16) for MMA instruction shape
- output_tile_shape = (128, 32) with c_swizzle=SWIZZLE_64B
- cluster_shape = (1, 1, 1) (single CTA per cluster)
BK is chosen to match the activation/filter swizzle; see
default_bf16 for details.
Pipeline stages are dynamically computed to maximize SMEM utilization
unless num_pipeline_stages_override is provided.
Parameters:
- βswizzle (
TensorMapSwizzle): Activation/filter swizzle mode. Defaults to SWIZZLE_128B. Use SWIZZLE_64B for C_in where C*sizeof(dtype) is 64B-aligned but not 128B-aligned (e.g. bf16 C_in=96). - βnum_pipeline_stages_override (
Int): If > 0, use this value for num_pipeline_stages instead of the auto-sizer. Used when the auto-sizer over-estimates the stage budget (e.g. at smaller BK it doesn't account for conv-specific SMEM like SourceTiles).
default_fp16β
static default_fp16[swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B]() -> Self
Default configuration for FP16 conv2d.
Parameters:
- βswizzle (
TensorMapSwizzle): Activation/filter swizzle mode. Seedefault_bf16.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!