Mojo struct
Conv2dSmem
struct Conv2dSmem[act_type: DType, filter_type: DType, out_type: DType, *, config: Conv2dConfig[act_type, filter_type, out_type]]
Shared memory layout for SM100 Conv2D fprop kernel.
This struct manages shared memory allocation for:
- Activation tiles (after im2col transformation)
- Filter tiles
- Output tiles for accumulation
- Synchronization barriers
The layout mirrors B200MatmulSmem but with conv-specific semantics:
- A tiles = im2col'd activation (M x K where M = NHW, K = CRS)
- B tiles = filter (transposed, K x N where K = CRS, N = K_out)
- C tiles = output (M x N)
Parametersβ
- βact_type (
DType): Activation data type. - βfilter_type (
DType): Filter data type. - βout_type (
DType): Output data type. - βconfig (
Conv2dConfig[act_type, filter_type, out_type]): Kernel configuration.
Fieldsβ
- βinput_tiles (
Conv2dSmem[act_type, filter_type, out_type, config=config].InputTiles): - βoutput_tiles (
Conv2dSmem[act_type, filter_type, out_type, config=config].OutputTiles): - βsource_tiles (
Conv2dSmem[act_type, filter_type, out_type, config=config].SourceTiles): - βpipelines (
Conv2dSmem[act_type, filter_type, out_type, config=config].Pipelines): - βepi_load_pipeline (
Conv2dSmem[act_type, filter_type, out_type, config=config].EpiLoadPipeline): - βload_order_barrier (
Conv2dSmem[act_type, filter_type, out_type, config=config].LoadOrderBarrier):
Implemented traitsβ
AnyType,
ImplicitlyDestructible
comptime membersβ
act_smem_elementsβ
comptime act_smem_elements = Layout[*?, *?].static_product
ActTileArrayβ
comptime ActTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].InputTiles.ATileArray
BKβ
comptime BK = config.block_tile_shape[2]
BMβ
comptime BM = config.block_tile_shape[0]
BNβ
comptime BN = config.block_tile_shape[1]
EpiLoadBarriersβ
comptime EpiLoadBarriers = Conv2dSmem[act_type, filter_type, out_type, config=config].EpiLoadPipeline.BarrierArray
EpiLoadPipelineβ
comptime EpiLoadPipeline = EpiLoadPipelineStorage[Conv2dSmem[act_type, filter_type, out_type, config=config].num_epi_load_stages]
filter_smem_elementsβ
comptime filter_smem_elements = Layout[*?, *?].static_product
FilterTileArrayβ
comptime FilterTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].InputTiles.BTileArray
InputTilesβ
comptime InputTiles = StandardTileStorage[act_type, filter_type, IndexList(Conv2dSmem[act_type, filter_type, out_type, config=config].BM, Conv2dSmem[act_type, filter_type, out_type, config=config].BK, __list_literal__=NoneType(None)), IndexList(Conv2dSmem[act_type, filter_type, out_type, config=config].BN, Conv2dSmem[act_type, filter_type, out_type, config=config].BK, __list_literal__=NoneType(None)), Conv2dSmem[act_type, filter_type, out_type, config=config].num_pipeline_stages]
LoadOrderBarrierβ
comptime LoadOrderBarrier = LoadOrderBarrierStorage
LoadOrderBarriersβ
comptime LoadOrderBarriers = Conv2dSmem[act_type, filter_type, out_type, config=config].LoadOrderBarrier.BarrierArray
num_accum_pipeline_stagesβ
comptime num_accum_pipeline_stages = config.num_accum_pipeline_stages
num_clc_pipeline_stagesβ
comptime num_clc_pipeline_stages = config.num_clc_pipeline_stages
num_epi_load_stagesβ
comptime num_epi_load_stages = 2
num_group_pipeline_stagesβ
comptime num_group_pipeline_stages = (Conv2dSmem[act_type, filter_type, out_type, config=config].num_pipeline_stages // config)
num_output_stagesβ
comptime num_output_stages = config.num_output_stages
num_pipeline_stagesβ
comptime num_pipeline_stages = config.num_pipeline_stages
out_smem_layoutβ
comptime out_smem_layout = Layout.row_major(Conv2dSmem[act_type, filter_type, out_type, config=config].OutputM, Conv2dSmem[act_type, filter_type, out_type, config=config].OutputN)
OutputMβ
comptime OutputM = config.output_tile_shape[0]
OutputNβ
comptime OutputN = config.output_tile_shape[1]
OutputTilesβ
comptime OutputTiles = OutputTileStorage[out_type, Conv2dSmem[act_type, filter_type, out_type, config=config].OutputM, Conv2dSmem[act_type, filter_type, out_type, config=config].OutputN, Conv2dSmem[act_type, filter_type, out_type, config=config].num_output_stages]
OutTileArrayβ
comptime OutTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].OutputTiles.CTileArray
Pipelinesβ
comptime Pipelines = SmemPipelineBundle[Conv2dSmem[act_type, filter_type, out_type, config=config].num_group_pipeline_stages, Conv2dSmem[act_type, filter_type, out_type, config=config].num_accum_pipeline_stages, Conv2dSmem[act_type, filter_type, out_type, config=config].num_clc_pipeline_stages, StandardTilePayload[act_type, filter_type, IndexList(Conv2dSmem[act_type, filter_type, out_type, config=config].BM, Conv2dSmem[act_type, filter_type, out_type, config=config].BK, __list_literal__=NoneType(None)), IndexList(Conv2dSmem[act_type, filter_type, out_type, config=config].BN, Conv2dSmem[act_type, filter_type, out_type, config=config].BK, __list_literal__=NoneType(None)), Conv2dSmem[act_type, filter_type, out_type, config=config].num_pipeline_stages]]
SourceTilesβ
comptime SourceTiles = SourceTileStorage[out_type, IndexList(Conv2dSmem[act_type, filter_type, out_type, config=config].OutputM, Conv2dSmem[act_type, filter_type, out_type, config=config].OutputN, __list_literal__=NoneType(None)), Conv2dSmem[act_type, filter_type, out_type, config=config].num_epi_load_stages]
SrcTileArrayβ
comptime SrcTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].SourceTiles.SrcTileArray
Methodsβ
act_tilesβ
act_tiles(ref[AddressSpace._value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].ActTileArray
Get activation tiles (im2col'd).
Returns:
Conv2dSmem[act_type, filter_type, out_type, config=config].ActTileArray
filter_tilesβ
filter_tiles(ref[AddressSpace._value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].FilterTileArray
Get filter tiles.
Returns:
Conv2dSmem[act_type, filter_type, out_type, config=config].FilterTileArray
out_tilesβ
out_tiles(ref[AddressSpace._value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].OutTileArray
Get output tiles.
Returns:
Conv2dSmem[act_type, filter_type, out_type, config=config].OutTileArray
src_tilesβ
src_tiles(ref[AddressSpace._value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].SrcTileArray
Get source C tiles (for residual operations).
Returns:
Conv2dSmem[act_type, filter_type, out_type, config=config].SrcTileArray
epi_load_barriersβ
epi_load_barriers(ref[AddressSpace._value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].EpiLoadBarriers
Get epilogue load pipeline barriers.
Used for synchronization between EpilogueLoad warp (producer) and Epilogue warps (consumers) for source C tensor loading.
Returns:
Conv2dSmem[act_type, filter_type, out_type, config=config].EpiLoadBarriers
get_load_order_barrierβ
get_load_order_barrier(ref[AddressSpace._value] self) -> Conv2dSmem[act_type, filter_type, out_type, config=config].LoadOrderBarriers
Get load order barrier.
Used to coordinate MainLoad warp with EpilogueLoad warp, ensuring epilogue loads don't start before mainloop prologue completes.
Returns:
Conv2dSmem[act_type, filter_type, out_type, config=config].LoadOrderBarriers
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!