Mojo struct
HopperMatmulSM90Kernel_SMem
struct HopperMatmulSM90Kernel_SMem[a_type: DType, b_type: DType, c_type: DType, BM: Int, BN: Int, BK: Int, WG_BM: Int, WG_BN: Int, num_pipeline_stages: Int, k_group_size: Int, swizzle_bytes: Int = 128]
Shared memory layout for Hopper SM90 matrix multiplication kernel.
This struct manages the shared memory allocation for:
- Input tiles (A and B matrices) with multi-stage pipelining
- Output tile (C matrix) for accumulation
- Synchronization barriers for producer-consumer coordination
The memory is organized to support asynchronous loads and efficient bank-conflict-free access patterns for tensor core operations.
All tiles use TileTensor-based types from tile_types.mojo. At TMA/WGMMA boundaries, pass {tile.ptr} to construct LayoutTensor.
Fields
- a_tiles_storage (
HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].ATileArray.Storage): - b_tiles_storage (
HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].BTileArray.Storage): - c_tile_storage (
HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Storage): - barriers (
BarrierPair[(num_pipeline_stages // k_group_size)]):
Implemented traits
AnyType,
ImplicitlyDestructible
comptime members
__del__is_trivial
comptime __del__is_trivial = True
ATileArray
comptime ATileArray = SMemTileArrayWithLayout[a_type, Layout(Coord(VariadicPack(Coord(VariadicPack(Idx[8](), Idx[(BM // 8)]())), Coord(VariadicPack(Idx[(swizzle_bytes // size_of[a_type]())](), Idx[((BK * size_of[a_type]()) // swizzle_bytes)]())))), Coord(VariadicPack(Coord(VariadicPack(Idx[(swizzle_bytes // size_of[a_type]())](), Idx[(8 * (swizzle_bytes // size_of[a_type]()))]())), Coord(VariadicPack(Idx[1](), Idx[0 if (((BK * size_of[a_type]()) // swizzle_bytes) == 1)._mlir_value else (BM * (swizzle_bytes // size_of[a_type]()))]()))))), num_pipeline_stages]
BTileArray
comptime BTileArray = SMemTileArrayWithLayout[b_type, Layout(Coord(VariadicPack(Coord(VariadicPack(Idx[8](), Idx[(BN // 8)]())), Coord(VariadicPack(Idx[(swizzle_bytes // size_of[b_type]())](), Idx[((BK * size_of[b_type]()) // swizzle_bytes)]())))), Coord(VariadicPack(Coord(VariadicPack(Idx[(swizzle_bytes // size_of[b_type]())](), Idx[(8 * (swizzle_bytes // size_of[b_type]()))]())), Coord(VariadicPack(Idx[1](), Idx[0 if (((BK * size_of[b_type]()) // swizzle_bytes) == 1)._mlir_value else (BN * (swizzle_bytes // size_of[b_type]()))]()))))), num_pipeline_stages]
CTile
comptime CTile = HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Tile
CTileArray
comptime CTileArray = SMemTileArray2DRowMajor[c_type, WG_BM, WG_BN, 1]
Methods
a_tiles
a_tiles(ref[AddressSpace._value._mlir_value] self) -> HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].ATileArray
Get A tile array accessor (TileTensor-based).
Returns:
HopperMatmulSM90Kernel_SMem
b_tiles
b_tiles(ref[AddressSpace._value._mlir_value] self) -> HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].BTileArray
Get B tile array accessor (TileTensor-based).
Returns:
HopperMatmulSM90Kernel_SMem
c_tile
c_tile(ref[AddressSpace._value._mlir_value] self) -> HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTile
Get C tile accessor (TileTensor-based).
Returns:
HopperMatmulSM90Kernel_SMem
create_pipeline
create_pipeline(ref[AddressSpace._value._mlir_value] self) -> ProducerConsumerPipeline[(num_pipeline_stages // k_group_size)]
Create producer-consumer pipeline from barrier storage.
Returns:
pipeline_storage_size
static pipeline_storage_size() -> Int
Calculate the memory size for all pipeline stages.
Returns:
output_storage_size
storage_size
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!