Skip to main content

Mojo struct

Conv2dConfig

struct Conv2dConfig[act_type: DType, filter_type: DType, out_type: DType]

Configuration for SM100 Conv2D kernel.

This mirrors MatmulConfig but with conv-specific semantics.

Parameters

  • act_type (DType): Activation (input) data type.
  • filter_type (DType): Filter (weight) data type.
  • out_type (DType): Output data type.

Fields

  • block_tile_shape (IndexList[3]):
  • mma_shape (IndexList[3]):
  • output_tile_shape (IndexList[2]):
  • num_pipeline_stages (Int):
  • num_output_stages (Int):
  • num_accum_pipeline_stages (Int):
  • num_clc_pipeline_stages (Int):
  • k_group_size (Int):
  • cluster_shape (IndexList[3]):
  • cta_group (Int):
  • a_swizzle (TensorMapSwizzle):
  • b_swizzle (TensorMapSwizzle):
  • c_swizzle (TensorMapSwizzle):
  • block_swizzle_size (Int):

Implemented traits

AnyType, Copyable, ImplicitlyDestructible, Movable

comptime members

__copyinit__is_trivial

comptime __copyinit__is_trivial = True

__del__is_trivial

comptime __del__is_trivial = True

__moveinit__is_trivial

comptime __moveinit__is_trivial = True

Methods

accum_type

static accum_type() -> DType

Accumulator type derived from output type.

Returns:

DType

default_bf16

static default_bf16() -> Self

Default configuration for BF16 conv2d (VAE-optimized).

Uses 2-SM cluster mode (cta_group=2) with 128×128 block tiles, matching the standard SM100 matmul configuration pattern.

For cta_group=2 with MMA_M=256, MMA_N=256:

  • block_tile_shape = mma_shape // cta_group = (128, 128, 64)
  • output_tile_shape = (128, 32) - each output tile is 128 rows × 32 cols
  • cluster_shape[0] = 2 (2 CTAs in M dimension)

Pipeline stages are dynamically computed to maximize SMEM utilization.

default_bf16_1sm

static default_bf16_1sm() -> Self

Default configuration for BF16 conv2d using 1-SM mode.

Uses 1-SM mode (cta_group=1) with 128×128 block tiles, matching the CUTLASS example configuration.

For cta_group=1 with MMA_M=128, MMA_N=128, MMA_K=16:

  • block_tile_shape = (128, 128, 64) for tile sizes
  • mma_shape = (128, 128, 16) for MMA instruction shape
  • output_tile_shape = (128, 32) with c_swizzle=SWIZZLE_64B
  • cluster_shape = (1, 1, 1) (single CTA per cluster)

Pipeline stages are dynamically computed to maximize SMEM utilization.

default_fp16

static default_fp16() -> Self

Default configuration for FP16 conv2d.

Was this page helpful?