Mojo struct
B200MatmulSmem
struct B200MatmulSmem[a_type: DType, b_type: DType, c_type: DType, transpose_b: Bool, *, config: MatmulConfig[a_type, b_type, c_type, transpose_b]]
Shared memory layout for B200 SM100 matrix multiplication kernel.
This struct manages the shared memory allocation for:
- Input tiles (A and B matrices) with multi-stage pipelining
- Output tile (C matrix) for accumulation
- Synchronization barriers for producer-consumer coordination
- CLC (Cluster Launch Control) barriers and response storage
- TMEM (Tensor Memory) address and deallocation barrier
The memory is organized to support asynchronous TMA loads and efficient bank-conflict-free access patterns for tensor core operations.
Type aliases are provided for tile types (ATile, BTile, CTile) to enable cleaner function signatures.
Fieldsβ
- βinput_tiles (
B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].InputTiles): - βoutput_tiles (
B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].OutputTiles): - βepilogue_load_tiles_storage (
B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].EpilogueLoadTileArray.Storage): - βpipelines (
B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].Pipelines):
Implemented traitsβ
AnyType,
ImplicitlyDestructible
comptime membersβ
ATileArrayβ
comptime ATileArray = B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].InputTiles.ATileArray
BKβ
comptime BK = config.block_tile_shape[2]
BMβ
comptime BM = config.block_tile_shape[0]
BNβ
comptime BN = config.block_tile_shape[1]
BTileArrayβ
comptime BTileArray = B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].InputTiles.BTileArray
CTileArrayβ
comptime CTileArray = B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].OutputTiles.CTileArray
epilogue_load_tile_colsβ
comptime epilogue_load_tile_cols = B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BM if config.AB_swapped else B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].MMA_N if config.epilogue_is_1d else B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BM if config.AB_swapped else B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].OutputN
epilogue_load_tile_rowsβ
comptime epilogue_load_tile_rows = 1 if config.epilogue_is_1d else B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].MMA_N if config.AB_swapped else B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BM
EpilogueLoadTileArrayβ
comptime EpilogueLoadTileArray = SMemTileArray2DRowMajor[c_type, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].epilogue_load_tile_rows, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].epilogue_load_tile_cols, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].num_epilogue_load_stages]
InputTilesβ
comptime InputTiles = StandardTileStorage[a_type, b_type, IndexList(B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BM, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BK, __list_literal__=NoneType(None)), IndexList(B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BN, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BK, __list_literal__=NoneType(None)), B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].num_pipeline_stages]
Layoutsβ
comptime Layouts = SmemLayouts[a_type, b_type, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BM, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BN, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BK, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].OutputM, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].OutputN, config.a_swizzle, config.b_swizzle, transpose_b]
MMA_Nβ
comptime MMA_N = config.mma_shape[1]
num_accum_pipeline_stagesβ
comptime num_accum_pipeline_stages = config.num_accum_pipeline_stages
num_clc_pipeline_stagesβ
comptime num_clc_pipeline_stages = config.num_clc_pipeline_stages
num_epilogue_load_stagesβ
comptime num_epilogue_load_stages = config.num_accum_pipeline_stages if config.AB_swapped or config.epilogue_is_1d else config.num_tma_epilogue_pipeline_stages if config.use_tma_epilogue_load else 0
num_group_pipeline_stagesβ
comptime num_group_pipeline_stages = (B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].num_pipeline_stages // config)
num_output_stagesβ
comptime num_output_stages = config.num_output_stages
num_pipeline_stagesβ
comptime num_pipeline_stages = config.num_pipeline_stages
OutputMβ
comptime OutputM = config.output_tile_shape[0]
OutputNβ
comptime OutputN = config.output_tile_shape[1]
OutputTilesβ
comptime OutputTiles = OutputTileStorage[c_type, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].OutputM, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].OutputN, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].num_output_stages]
Pipelinesβ
comptime Pipelines = SmemPipelineBundle[B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].num_group_pipeline_stages, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].num_accum_pipeline_stages, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].num_clc_pipeline_stages, StandardTilePayload[a_type, b_type, IndexList(B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BM, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BK, __list_literal__=NoneType(None)), IndexList(B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BN, B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BK, __list_literal__=NoneType(None)), B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].num_pipeline_stages], B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].num_epilogue_load_stages]
Methodsβ
a_tilesβ
a_tiles(ref[AddressSpace._value] self) -> B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].ATileArray
Returns:
B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].ATileArray
b_tilesβ
b_tiles(ref[AddressSpace._value] self) -> B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BTileArray
Returns:
B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BTileArray
c_tilesβ
c_tiles(ref[AddressSpace._value] self) -> B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].CTileArray
Returns:
B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].CTileArray
epilogue_load_tilesβ
epilogue_load_tiles(ref[AddressSpace._value] self) -> B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].EpilogueLoadTileArray
Returns:
B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].EpilogueLoadTileArray
ab_pipeline_sizeβ
static ab_pipeline_size() -> Int
Total size of A+B tiles for all pipeline stages (in elements).
Returns:
c_output_sizeβ
epilogue_load_tile_sizeβ
static epilogue_load_tile_size() -> Int
Size of epilogue load tiles for all stages (in elements). Zero when config.use_tma_epilogue_load=False.
Returns:
total_tile_sizeβ
static total_tile_size() -> Int
Total tile storage size (A+B+C+epilogue load) in elements.
Returns:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!