For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
Conv2dSmem
struct Conv2dSmem[act_type: DType, filter_type: DType, out_type: DType, *, config: Conv2dConfig[act_type, filter_type, out_type]]
Shared memory layout for SM100 Conv2D fprop kernel.
This struct manages shared memory allocation for:
- Activation tiles (after im2col transformation)
- Filter tiles
- Output tiles for accumulation
- Synchronization barriers
The layout mirrors B200MatmulSmem but with conv-specific semantics:
- A tiles = im2col'd activation (M x K where M = NHW, K = CRS)
- B tiles = filter (transposed, K x N where K = CRS, N = K_out)
- C tiles = output (M x N)
Parametersβ
- βact_type (
DType): Activation data type. - βfilter_type (
DType): Filter data type. - βout_type (
DType): Output data type. - βconfig (
Conv2dConfig[act_type, filter_type, out_type]): Kernel configuration.
Fieldsβ
- βinput_tiles (
Conv2dSmem[act_type, filter_type, out_type, config=config].InputTiles): - βoutput_tiles (
Conv2dSmem[act_type, filter_type, out_type, config=config].OutputTiles): - βsource_tiles (
Conv2dSmem[act_type, filter_type, out_type, config=config].SourceTiles): - βpipelines (
Conv2dSmem[act_type, filter_type, out_type, config=config].Pipelines): - βepi_load_pipeline (
Conv2dSmem[act_type, filter_type, out_type, config=config].EpiLoadPipeline): - βload_order_barrier (
Conv2dSmem[act_type, filter_type, out_type, config=config].LoadOrderBarrier):
Implemented traitsβ
comptime membersβ
act_smem_elementsβ
comptime act_smem_elements = Layout[*?, *?].static_product
ActTileArrayβ
comptime ActTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].InputTiles.ATileArray
BKβ
comptime BK = config.block_tile_shape[Int(2)]
BMβ
comptime BM = config.block_tile_shape[Int(0)]
BNβ
comptime BN = config.block_tile_shape[Int(1)]
EpiLoadBarriersβ
comptime EpiLoadBarriers = Conv2dSmem[act_type, filter_type, out_type, config=config].EpiLoadPipeline.BarrierArray
EpiLoadPipelineβ
comptime EpiLoadPipeline = EpiLoadPipelineStorage[(config.mma_shape[Int(1)] // config.output_tile_shape[Int(1)])]
filter_smem_elementsβ
comptime filter_smem_elements = Layout[*?, *?].static_product
FilterTileArrayβ
comptime FilterTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].InputTiles.BTileArray
InputTilesβ
comptime InputTiles = StandardTileStorage[act_type, filter_type, IndexList(config.block_tile_shape[Int(0)], config.block_tile_shape[Int(2)], __list_literal__=NoneType(None)), IndexList(config.block_tile_shape[Int(1)], config.block_tile_shape[Int(2)], __list_literal__=NoneType(None)), config.num_pipeline_stages]
LoadOrderBarrierβ
comptime LoadOrderBarrier = LoadOrderBarrierStorage
LoadOrderBarriersβ
comptime LoadOrderBarriers = Conv2dSmem[act_type, filter_type, out_type, config=config].LoadOrderBarrier.BarrierArray
num_accum_pipeline_stagesβ
comptime num_accum_pipeline_stages = config.num_accum_pipeline_stages
num_clc_pipeline_stagesβ
comptime num_clc_pipeline_stages = config.num_clc_pipeline_stages
num_epi_load_stagesβ
comptime num_epi_load_stages = (config.mma_shape[Int(1)] // config.output_tile_shape[Int(1)])
num_group_pipeline_stagesβ
comptime num_group_pipeline_stages = (config // config)
num_output_stagesβ
comptime num_output_stages = config.num_output_stages
num_pipeline_stagesβ
comptime num_pipeline_stages = config.num_pipeline_stages
out_smem_layoutβ
comptime out_smem_layout = Layout.row_major(config.output_tile_shape[Int(0)], config.output_tile_shape[Int(1)])
OutputMβ
comptime OutputM = config.output_tile_shape[Int(0)]
OutputNβ
comptime OutputN = config.output_tile_shape[Int(1)]
OutputTilesβ
comptime OutputTiles = OutputTileStorage[out_type, config.output_tile_shape[Int(0)], config.output_tile_shape[Int(1)], config.num_output_stages]
OutTileArrayβ
comptime OutTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].OutputTiles.CTileArray
Pipelinesβ
comptime Pipelines = SmemPipelineBundle[(config // config), config.num_accum_pipeline_stages, config.num_clc_pipeline_stages, StandardTilePayload[act_type, filter_type, IndexList(config.block_tile_shape[Int(0)], config.block_tile_shape[Int(2)], __list_literal__=NoneType(None)), IndexList(config.block_tile_shape[Int(1)], config.block_tile_shape[Int(2)], __list_literal__=NoneType(None)), config.num_pipeline_stages]]
SourceTilesβ
comptime SourceTiles = SourceTileStorage[out_type, IndexList(config.output_tile_shape[Int(0)], config.output_tile_shape[Int(1)], __list_literal__=NoneType(None)), (config.mma_shape[Int(1)] // config.output_tile_shape[Int(1)])]
SrcTileArrayβ
comptime SrcTileArray = Conv2dSmem[act_type, filter_type, out_type, config=config].SourceTiles.SrcTileArray
Methodsβ
act_tilesβ
def act_tiles(ref[AddressSpace._value] self) -> Self.ActTileArray
Get activation tiles (im2col'd).
Returns:
Self.ActTileArray
filter_tilesβ
def filter_tiles(ref[AddressSpace._value] self) -> Self.FilterTileArray
Get filter tiles.
Returns:
Self.FilterTileArray
out_tilesβ
def out_tiles(ref[AddressSpace._value] self) -> Self.OutTileArray
Get output tiles.
Returns:
Self.OutTileArray
src_tilesβ
def src_tiles(ref[AddressSpace._value] self) -> Self.SrcTileArray
Get source C tiles (for residual operations).
Returns:
Self.SrcTileArray
epi_load_barriersβ
def epi_load_barriers(ref[AddressSpace._value] self) -> Self.EpiLoadBarriers
Get epilogue load pipeline barriers.
Used for synchronization between EpilogueLoad warp (producer) and Epilogue warps (consumers) for source C tensor loading.
Returns:
Self.EpiLoadBarriers
get_load_order_barrierβ
def get_load_order_barrier(ref[AddressSpace._value] self) -> Self.LoadOrderBarriers
Get load order barrier.
Used to coordinate MainLoad warp with EpilogueLoad warp, ensuring epilogue loads don't start before mainloop prologue completes.
Returns:
Self.LoadOrderBarriers
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!