Mojo struct
Conv2dSmem
struct Conv2dSmem[act_type: DType, filter_type: DType, out_type: DType, *, config: Conv2dConfig[act_type, filter_type, out_type]]
Shared memory layout for SM100 Conv2D fprop kernel.
This struct manages shared memory allocation for:
- Activation tiles (after im2col transformation)
- Filter tiles
- Output tiles for accumulation
- Synchronization barriers
The layout mirrors B200MatmulSmem but with conv-specific semantics:
- A tiles = im2col'd activation (M x K where M = NHW, K = CRS)
- B tiles = filter (transposed, K x N where K = CRS, N = K_out)
- C tiles = output (M x N)
Parameters
- act_type (
DType): Activation data type. - filter_type (
DType): Filter data type. - out_type (
DType): Output data type. - config (
Conv2dConfig): Kernel configuration.
Fields
- input_tiles (
Conv2dSmem[act_type, filter_type, out_type, config=config].InputTiles): - output_tiles (
Conv2dSmem[act_type, filter_type, out_type, config=config].OutputTiles): - source_tiles (
Conv2dSmem[act_type, filter_type, out_type, config=config].SourceTiles): - input_pipeline (
Conv2dSmem[act_type, filter_type, out_type, config=config].InputPipeline): - output_pipeline (
Conv2dSmem[act_type, filter_type, out_type, config=config].OutputPipeline): - clc_pipeline (
Conv2dSmem[act_type, filter_type, out_type, config=config].ClcPipeline): - tmem_dealloc_pipeline (
Conv2dSmem[act_type, filter_type, out_type, config=config].TmemDeallocPipeline): - epi_load_pipeline (
Conv2dSmem[act_type, filter_type, out_type, config=config].EpiLoadPipeline): - load_order_barrier (
Conv2dSmem[act_type, filter_type, out_type, config=config].LoadOrderBarrier):
Implemented traits
AnyType,
ImplicitlyDestructible
comptime members
__del__is_trivial
comptime __del__is_trivial = True
AccumBarriers
comptime AccumBarriers = Conv2dSmem[act_type, filter_type, out_type, config=config].OutputPipeline.BarrierArray
act_smem_layout
comptime act_smem_layout = tile_layout_k_major[act_type, Conv2dSmem[act_type, filter_type, out_type, config=config].BM, Conv2dSmem[act_type, filter_type, out_type, config=config].BK, config.a_swizzle]()
ActTileArray
comptime ActTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].InputTiles.ATileArray
BK
comptime BK = config.block_tile_shape.__getitem__[3, DType.int64, Int](2)
BM
comptime BM = config.block_tile_shape.__getitem__[3, DType.int64, Int](0)
BN
comptime BN = config.block_tile_shape.__getitem__[3, DType.int64, Int](1)
ClcBarriers
comptime ClcBarriers = Conv2dSmem[act_type, filter_type, out_type, config=config].ClcPipeline.BarrierArray
ClcPipeline
comptime ClcPipeline = ClcPipelineStorage[Conv2dSmem[act_type, filter_type, out_type, config=config].num_clc_pipeline_stages]
ClcResponse
comptime ClcResponse = Conv2dSmem[act_type, filter_type, out_type, config=config].ClcPipeline.ResponseArray
ClcThrottleBarriers
comptime ClcThrottleBarriers = Conv2dSmem[act_type, filter_type, out_type, config=config].ClcPipeline.ThrottleArray
EpiLoadBarriers
comptime EpiLoadBarriers = Conv2dSmem[act_type, filter_type, out_type, config=config].EpiLoadPipeline.BarrierArray
EpiLoadPipeline
comptime EpiLoadPipeline = EpiLoadPipelineStorage[Conv2dSmem[act_type, filter_type, out_type, config=config].num_epi_load_stages]
filter_smem_layout
comptime filter_smem_layout = tile_layout_k_major[filter_type, Conv2dSmem[act_type, filter_type, out_type, config=config].BN, Conv2dSmem[act_type, filter_type, out_type, config=config].BK, config.b_swizzle]()
FilterTileArray
comptime FilterTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].InputTiles.BTileArray
InputBarriers
comptime InputBarriers = Conv2dSmem[act_type, filter_type, out_type, config=config].InputPipeline.BarrierArray
InputPipeline
comptime InputPipeline = InputPipelineStorage[Conv2dSmem[act_type, filter_type, out_type, config=config].num_group_pipeline_stages, StandardTilePayload[act_type, filter_type, Conv2dSmem[act_type, filter_type, out_type, config=config].BM, Conv2dSmem[act_type, filter_type, out_type, config=config].BK, Conv2dSmem[act_type, filter_type, out_type, config=config].BN, Conv2dSmem[act_type, filter_type, out_type, config=config].BK, Conv2dSmem[act_type, filter_type, out_type, config=config].num_pipeline_stages]]
InputTiles
comptime InputTiles = StandardTileStorage[act_type, filter_type, Conv2dSmem[act_type, filter_type, out_type, config=config].BM, Conv2dSmem[act_type, filter_type, out_type, config=config].BK, Conv2dSmem[act_type, filter_type, out_type, config=config].BN, Conv2dSmem[act_type, filter_type, out_type, config=config].BK, Conv2dSmem[act_type, filter_type, out_type, config=config].num_pipeline_stages]
LoadOrderBarrier
comptime LoadOrderBarrier = LoadOrderBarrierStorage
LoadOrderBarriers
comptime LoadOrderBarriers = Conv2dSmem[act_type, filter_type, out_type, config=config].LoadOrderBarrier.BarrierArray
num_accum_pipeline_stages
comptime num_accum_pipeline_stages = config.num_accum_pipeline_stages
num_clc_pipeline_stages
comptime num_clc_pipeline_stages = config.num_clc_pipeline_stages
num_epi_load_stages
comptime num_epi_load_stages = 2
num_group_pipeline_stages
comptime num_group_pipeline_stages = (Conv2dSmem[act_type, filter_type, out_type, config=config].num_pipeline_stages // config)
num_output_stages
comptime num_output_stages = config.num_output_stages
num_pipeline_stages
comptime num_pipeline_stages = config.num_pipeline_stages
out_smem_layout
comptime out_smem_layout = Layout.row_major(Conv2dSmem[act_type, filter_type, out_type, config=config].OutputM, Conv2dSmem[act_type, filter_type, out_type, config=config].OutputN)
OutputM
comptime OutputM = config.output_tile_shape.__getitem__[2, DType.int64, Int](0)
OutputN
comptime OutputN = config.output_tile_shape.__getitem__[2, DType.int64, Int](1)
OutputPipeline
comptime OutputPipeline = OutputPipelineStorage[Conv2dSmem[act_type, filter_type, out_type, config=config].num_accum_pipeline_stages]
OutputTiles
comptime OutputTiles = OutputTileStorage[out_type, Conv2dSmem[act_type, filter_type, out_type, config=config].OutputM, Conv2dSmem[act_type, filter_type, out_type, config=config].OutputN, Conv2dSmem[act_type, filter_type, out_type, config=config].num_output_stages]
OutTileArray
comptime OutTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].OutputTiles.CTileArray
SourceTiles
comptime SourceTiles = SourceTileStorage[out_type, Conv2dSmem[act_type, filter_type, out_type, config=config].OutputM, Conv2dSmem[act_type, filter_type, out_type, config=config].OutputN, Conv2dSmem[act_type, filter_type, out_type, config=config].num_epi_load_stages]
SrcTileArray
comptime SrcTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].SourceTiles.SrcTileArrayLT
TmemAddr
comptime TmemAddr = Conv2dSmem[act_type, filter_type, out_type, config=config].TmemDeallocPipeline.AddrArray
TmemDealloc
comptime TmemDealloc = Conv2dSmem[act_type, filter_type, out_type, config=config].TmemDeallocPipeline.BarrierArray
TmemDeallocPipeline
comptime TmemDeallocPipeline = TmemDeallocStorage
Methods
act_tiles
act_tiles(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].ActTileArray
Get activation tiles (im2col'd).
Returns:
Conv2dSmem
filter_tiles
filter_tiles(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].FilterTileArray
Get filter tiles.
Returns:
Conv2dSmem
out_tiles
out_tiles(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].OutTileArray
Get output tiles.
Returns:
Conv2dSmem
src_tiles
src_tiles(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].SrcTileArray
Get source C tiles (for residual operations).
Returns:
Conv2dSmem
input_barriers
input_barriers(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].InputBarriers
Returns:
Conv2dSmem
accum_barriers
accum_barriers(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].AccumBarriers
Returns:
Conv2dSmem
clc_full
clc_full(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].ClcBarriers
Returns:
Conv2dSmem
clc_empty
clc_empty(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].ClcBarriers
Returns:
Conv2dSmem
clc_throttle
clc_throttle(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].ClcThrottleBarriers
Returns:
Conv2dSmem
clc_response
clc_response(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].ClcResponse
Returns:
Conv2dSmem
tmem_dealloc
tmem_dealloc(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].TmemDealloc
Returns:
Conv2dSmem
tmem_addr
tmem_addr(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].TmemAddr
Returns:
Conv2dSmem
epi_load_barriers
epi_load_barriers(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].EpiLoadBarriers
Get epilogue load pipeline barriers.
Used for synchronization between EpilogueLoad warp (producer) and Epilogue warps (consumers) for source C tensor loading.
Returns:
Conv2dSmem
get_load_order_barrier
get_load_order_barrier(ref[AddressSpace._value._mlir_value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].LoadOrderBarriers
Get load order barrier.
Used to coordinate MainLoad warp with EpilogueLoad warp, ensuring epilogue loads don't start before mainloop prologue completes.
Returns:
Conv2dSmem
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!