IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

B200MatmulSmem

struct B200MatmulSmem[a_type: DType, b_type: DType, c_type: DType, transpose_b: Bool, *, config: MatmulConfig[a_type, b_type, c_type, transpose_b]]

Shared memory layout for B200 SM100 matrix multiplication kernel.

This struct manages the shared memory allocation for:

  • Input tiles (A and B matrices) with multi-stage pipelining
  • Output tile (C matrix) for accumulation
  • Synchronization barriers for producer-consumer coordination
  • CLC (Cluster Launch Control) barriers and response storage
  • TMEM (Tensor Memory) address and deallocation barrier

The memory is organized to support asynchronous TMA loads and efficient bank-conflict-free access patterns for tensor core operations.

Type aliases are provided for tile types (ATile, BTile, CTile) to enable cleaner function signatures.

Fields​

  • ​input_tiles (B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].InputTiles):
  • ​output_tiles (B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].OutputTiles):
  • ​epilogue_load_tiles_storage (B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].EpilogueLoadTileArray.Storage):
  • ​pipelines (B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].Pipelines):

Implemented traits​

AnyType, ImplicitlyDeletable

comptime members​

ATileArray​

comptime ATileArray = B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].InputTiles.ATileArray

BK​

comptime BK = config.block_tile_shape[Int(2)]

BM​

comptime BM = config.block_tile_shape[Int(0)]

BN​

comptime BN = config.block_tile_shape[Int(1)]

BTileArray​

comptime BTileArray = B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].InputTiles.BTileArray

CTileArray​

comptime CTileArray = B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].OutputTiles.CTileArray

epilogue_load_tile_cols​

comptime epilogue_load_tile_cols = B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BM if config.AB_swapped else B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].MMA_N if config.epilogue_is_1d else B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BM if config.AB_swapped else B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].OutputN

epilogue_load_tile_rows​

comptime epilogue_load_tile_rows = Int(1) if config.epilogue_is_1d else B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].MMA_N if config.AB_swapped else B200MatmulSmem[a_type, b_type, c_type, transpose_b, config=config].BM

EpilogueLoadTileArray​

comptime EpilogueLoadTileArray = SMemTileArray2DRowMajor[c_type, Int(1) if config.epilogue_is_1d else config.mma_shape[Int(1)] if config.AB_swapped else config.block_tile_shape[Int(0)], config.block_tile_shape[Int(0)] if config.AB_swapped else config.mma_shape[Int(1)] if config.epilogue_is_1d else config.block_tile_shape[Int(0)] if config.AB_swapped else config.output_tile_shape[Int(1)], config.num_accum_pipeline_stages if config.AB_swapped or config.epilogue_is_1d else config.num_tma_epilogue_pipeline_stages if config.use_tma_epilogue_load else Int(0)]

InputTiles​

comptime InputTiles = StandardTileStorage[a_type, b_type, IndexList(config.block_tile_shape[Int(0)], config.block_tile_shape[Int(2)], __list_literal__=NoneType(None)), IndexList(config.block_tile_shape[Int(1)], config.block_tile_shape[Int(2)], __list_literal__=NoneType(None)), config.num_pipeline_stages]

Layouts​

comptime Layouts = SmemLayouts[a_type, b_type, config.block_tile_shape[Int(0)], config.block_tile_shape[Int(1)], config.block_tile_shape[Int(2)], config.output_tile_shape[Int(0)], config.output_tile_shape[Int(1)], config.a_swizzle, config.b_swizzle, transpose_b]

MMA_N​

comptime MMA_N = config.mma_shape[Int(1)]

num_accum_pipeline_stages​

comptime num_accum_pipeline_stages = config.num_accum_pipeline_stages

num_clc_pipeline_stages​

comptime num_clc_pipeline_stages = config.num_clc_pipeline_stages

num_epilogue_load_stages​

comptime num_epilogue_load_stages = config.num_accum_pipeline_stages if config.AB_swapped or config.epilogue_is_1d else config.num_tma_epilogue_pipeline_stages if config.use_tma_epilogue_load else Int(0)

num_group_pipeline_stages​

comptime num_group_pipeline_stages = (config // config)

num_output_stages​

comptime num_output_stages = config.num_output_stages

num_pipeline_stages​

comptime num_pipeline_stages = config.num_pipeline_stages

OutputM​

comptime OutputM = config.output_tile_shape[Int(0)]

OutputN​

comptime OutputN = config.output_tile_shape[Int(1)]

OutputTiles​

comptime OutputTiles = OutputTileStorage[c_type, config.output_tile_shape[Int(0)], config.output_tile_shape[Int(1)], config.num_output_stages]

Pipelines​

comptime Pipelines = SmemPipelineBundle[(config // config), config.num_accum_pipeline_stages, config.num_clc_pipeline_stages, StandardTilePayload[a_type, b_type, IndexList(config.block_tile_shape[Int(0)], config.block_tile_shape[Int(2)], __list_literal__=NoneType(None)), IndexList(config.block_tile_shape[Int(1)], config.block_tile_shape[Int(2)], __list_literal__=NoneType(None)), config.num_pipeline_stages], config.num_accum_pipeline_stages if config.AB_swapped or config.epilogue_is_1d else config.num_tma_epilogue_pipeline_stages if config.use_tma_epilogue_load else Int(0)]

Methods​

a_tiles​

def a_tiles(ref[AddressSpace._value] self) -> Self.ATileArray

Returns:

Self.ATileArray

b_tiles​

def b_tiles(ref[AddressSpace._value] self) -> Self.BTileArray

Returns:

Self.BTileArray

c_tiles​

def c_tiles(ref[AddressSpace._value] self) -> Self.CTileArray

Returns:

Self.CTileArray

epilogue_load_tiles​

def epilogue_load_tiles(ref[AddressSpace._value] self) -> Self.EpilogueLoadTileArray

Returns:

Self.EpilogueLoadTileArray

ab_pipeline_size​

static def ab_pipeline_size() -> Int

Total size of A+B tiles for all pipeline stages (in elements).

Returns:

Int

c_output_size​

static def c_output_size() -> Int

Size of C tiles for all output stages (in elements).

Returns:

Int

epilogue_load_tile_size​

static def epilogue_load_tile_size() -> Int

Size of epilogue load tiles for all stages (in elements). Zero when config.use_tma_epilogue_load=False.

Returns:

Int

total_tile_size​

static def total_tile_size() -> Int

Total tile storage size (A+B+C+epilogue load) in elements.

Returns:

Int