For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
HopperMatmulSM90Kernel_SMem
struct HopperMatmulSM90Kernel_SMem[a_type: DType, b_type: DType, c_type: DType, BM: Int, BN: Int, BK: Int, WG_BM: Int, WG_BN: Int, num_pipeline_stages: Int, k_group_size: Int, swizzle_bytes: Int = Int(128)]
Shared memory layout for Hopper SM90 matrix multiplication kernel.
This struct manages the shared memory allocation for:
- Input tiles (A and B matrices) with multi-stage pipelining
- Output tile (C matrix) for accumulation
- Synchronization barriers for producer-consumer coordination
The memory is organized to support asynchronous loads and efficient bank-conflict-free access patterns for tensor core operations.
All tiles use TileTensor-based types from tile_types.mojo. At TMA/WGMMA boundaries, pass {tile.ptr} to construct the tile view.
Fieldsβ
- βa_tiles_storage (
HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].ATileArray.Storage): - βb_tiles_storage (
HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].BTileArray.Storage): - βc_tile_storage (
HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Storage): - βbarriers (
BarrierPair[(num_pipeline_stages // k_group_size)]):
Implemented traitsβ
comptime membersβ
ATileArrayβ
comptime ATileArray = SMemTileArrayWithLayout[a_type, Layout(Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt())), Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt()))), num_pipeline_stages]
BTileArrayβ
comptime BTileArray = SMemTileArrayWithLayout[b_type, Layout(Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt())), Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt()))), num_pipeline_stages]
CTileβ
comptime CTile = HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Tile
CTileArrayβ
comptime CTileArray = SMemTileArray2DRowMajor[c_type, WG_BM, WG_BN, Int(1)]
Methodsβ
a_tilesβ
def a_tiles(ref[AddressSpace._value] self) -> Self.ATileArray
Get A tile array accessor (TileTensor-based).
Returns:
Self.ATileArray
b_tilesβ
def b_tiles(ref[AddressSpace._value] self) -> Self.BTileArray
Get B tile array accessor (TileTensor-based).
Returns:
Self.BTileArray
c_tileβ
def c_tile(ref[AddressSpace._value] self) -> Self.CTile
Get C tile accessor (TileTensor-based).
Returns:
Self.CTile
create_pipelineβ
def create_pipeline(ref[AddressSpace._value] self) -> ProducerConsumerPipeline[(num_pipeline_stages // k_group_size)]
Create producer-consumer pipeline from barrier storage.
Returns:
ProducerConsumerPipeline[(num_pipeline_stages // k_group_size)]
pipeline_storage_sizeβ
static def pipeline_storage_size() -> Int
Calculate the memory size for all pipeline stages.
Returns:
output_storage_sizeβ
storage_sizeβ
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!