IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

Conv2dConfig

struct Conv2dConfig[act_type: DType, filter_type: DType, out_type: DType]

Configuration for SM100 Conv2D kernel.

This mirrors MatmulConfig but with conv-specific semantics.

Parameters​

  • ​act_type (DType): Activation (input) data type.
  • ​filter_type (DType): Filter (weight) data type.
  • ​out_type (DType): Output data type.

Fields​

  • ​block_tile_shape (IndexList[Int(3)]):
  • ​mma_shape (IndexList[Int(3)]):
  • ​output_tile_shape (IndexList[Int(2)]):
  • ​num_pipeline_stages (Int):
  • ​num_output_stages (Int):
  • ​num_accum_pipeline_stages (Int):
  • ​num_clc_pipeline_stages (Int):
  • ​k_group_size (Int):
  • ​cluster_shape (IndexList[Int(3)]):
  • ​cta_group (Int):
  • ​a_swizzle (TensorMapSwizzle):
  • ​b_swizzle (TensorMapSwizzle):
  • ​c_swizzle (TensorMapSwizzle):
  • ​block_swizzle_size (Int):

Implemented traits​

AnyType, Copyable, ImplicitlyDeletable, Movable

Methods​

accum_type​

static def accum_type() -> DType

Accumulator type derived from output type.

Returns:

DType

default_bf16​

static def default_bf16[swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B]() -> Self

Default configuration for BF16 conv2d (VAE-optimized).

Uses 2-SM cluster mode (cta_group=2) with 128Γ—128 block tiles, matching the standard SM100 matmul configuration pattern.

For cta_group=2 with MMA_M=256, MMA_N=256:

  • block_tile_shape = mma_shape // cta_group = (128, 128, BK)
  • output_tile_shape = (128, 32) - each output tile is 128 rows Γ— 32 cols
  • cluster_shape[0] = 2 (2 CTAs in M dimension)

BK is chosen to match the activation/filter swizzle so that the TMA descriptor's inner dim (channels_per_pixel = swizzle_bytes/sizeof) divides BK. This keeps K = CRS a whole multiple of BK whenever the dispatch guarantees C*sizeof(act_type) is a multiple of swizzle_bytes.

Pipeline stages are dynamically computed to maximize SMEM utilization.

Parameters:

  • ​swizzle (TensorMapSwizzle): Activation/filter swizzle mode. Defaults to SWIZZLE_128B (inner row = 128 bytes). Use SWIZZLE_64B for C_in where C*sizeof(dtype) is 64B-aligned but not 128B-aligned (e.g. bf16 C_in=96).

default_bf16_1sm​

static def default_bf16_1sm[swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, num_pipeline_stages_override: Int = Int(0)]() -> Self

Default configuration for BF16 conv2d using 1-SM mode.

Uses 1-SM mode (cta_group=1) with 128Γ—128 block tiles, matching the CUTLASS example configuration.

For cta_group=1 with MMA_M=128, MMA_N=128, MMA_K=16:

  • block_tile_shape = (128, 128, BK) for tile sizes
  • mma_shape = (128, 128, 16) for MMA instruction shape
  • output_tile_shape = (128, 32) with c_swizzle=SWIZZLE_64B
  • cluster_shape = (1, 1, 1) (single CTA per cluster)

BK is chosen to match the activation/filter swizzle; see default_bf16 for details.

Pipeline stages are dynamically computed to maximize SMEM utilization unless num_pipeline_stages_override is provided.

Parameters:

  • ​swizzle (TensorMapSwizzle): Activation/filter swizzle mode. Defaults to SWIZZLE_128B. Use SWIZZLE_64B for C_in where C*sizeof(dtype) is 64B-aligned but not 128B-aligned (e.g. bf16 C_in=96).
  • ​num_pipeline_stages_override (Int): If > 0, use this value for num_pipeline_stages instead of the auto-sizer. Used when the auto-sizer over-estimates the stage budget (e.g. at smaller BK it doesn't account for conv-specific SMEM like SourceTiles).

default_fp16​

static def default_fp16[swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B]() -> Self

Default configuration for FP16 conv2d.

Parameters:

  • ​swizzle (TensorMapSwizzle): Activation/filter swizzle mode. See default_bf16.