Mojo struct

Conv2dConfig

struct Conv2dConfig[act_type: DType, filter_type: DType, out_type: DType]

Configuration for SM100 Conv2D kernel.

This mirrors MatmulConfig but with conv-specific semantics.

Parameters

act_type (DType): Activation (input) data type.
filter_type (DType): Filter (weight) data type.
out_type (DType): Output data type.

Fields

block_tile_shape (IndexList[3]):
mma_shape (IndexList[3]):
output_tile_shape (IndexList[2]):
num_pipeline_stages (Int):
num_output_stages (Int):
num_accum_pipeline_stages (Int):
num_clc_pipeline_stages (Int):
k_group_size (Int):
cluster_shape (IndexList[3]):
cta_group (Int):
a_swizzle (TensorMapSwizzle):
b_swizzle (TensorMapSwizzle):
c_swizzle (TensorMapSwizzle):
block_swizzle_size (Int):

Implemented traits

AnyType, Copyable, ImplicitlyDestructible, Movable

Methods

`accum_type`

static accum_type() -> DType

Accumulator type derived from output type.

Returns:

DType

`default_bf16`

static default_bf16[swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B]() -> Self

Default configuration for BF16 conv2d (VAE-optimized).

Uses 2-SM cluster mode (cta_group=2) with 128×128 block tiles, matching the standard SM100 matmul configuration pattern.

For cta_group=2 with MMA_M=256, MMA_N=256:

block_tile_shape = mma_shape // cta_group = (128, 128, BK)
output_tile_shape = (128, 32) - each output tile is 128 rows × 32 cols
cluster_shape[0] = 2 (2 CTAs in M dimension)

BK is chosen to match the activation/filter swizzle so that the TMA descriptor's inner dim (channels_per_pixel = swizzle_bytes/sizeof) divides BK. This keeps K = CRS a whole multiple of BK whenever the dispatch guarantees C*sizeof(act_type) is a multiple of swizzle_bytes.

Pipeline stages are dynamically computed to maximize SMEM utilization.

Parameters:

swizzle (TensorMapSwizzle): Activation/filter swizzle mode. Defaults to SWIZZLE_128B (inner row = 128 bytes). Use SWIZZLE_64B for C_in where C*sizeof(dtype) is 64B-aligned but not 128B-aligned (e.g. bf16 C_in=96).

`default_bf16_1sm`

static default_bf16_1sm[swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, num_pipeline_stages_override: Int = 0]() -> Self

Default configuration for BF16 conv2d using 1-SM mode.

Uses 1-SM mode (cta_group=1) with 128×128 block tiles, matching the CUTLASS example configuration.

For cta_group=1 with MMA_M=128, MMA_N=128, MMA_K=16:

block_tile_shape = (128, 128, BK) for tile sizes
mma_shape = (128, 128, 16) for MMA instruction shape
output_tile_shape = (128, 32) with c_swizzle=SWIZZLE_64B
cluster_shape = (1, 1, 1) (single CTA per cluster)

BK is chosen to match the activation/filter swizzle; see default_bf16 for details.

Pipeline stages are dynamically computed to maximize SMEM utilization unless num_pipeline_stages_override is provided.

Parameters:

swizzle (TensorMapSwizzle): Activation/filter swizzle mode. Defaults to SWIZZLE_128B. Use SWIZZLE_64B for C_in where C*sizeof(dtype) is 64B-aligned but not 128B-aligned (e.g. bf16 C_in=96).
num_pipeline_stages_override (Int): If > 0, use this value for num_pipeline_stages instead of the auto-sizer. Used when the auto-sizer over-estimates the stage budget (e.g. at smaller BK it doesn't account for conv-specific SMEM like SourceTiles).

`default_fp16`

static default_fp16[swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B]() -> Self

Default configuration for FP16 conv2d.

Parameters:

swizzle (TensorMapSwizzle): Activation/filter swizzle mode. See default_bf16.

Parameters​

Fields​

Implemented traits​

Methods​

accum_type​

default_bf16​

default_bf16_1sm​

default_fp16​

Parameters

Fields

Implemented traits

Methods

`accum_type`

`default_bf16`

`default_bf16_1sm`

`default_fp16`