Skip to main content

Mojo struct

HopperMatmulSM90Kernel_SMem

struct HopperMatmulSM90Kernel_SMem[a_type: DType, b_type: DType, c_type: DType, BM: Int, BN: Int, BK: Int, WG_BM: Int, WG_BN: Int, num_pipeline_stages: Int, k_group_size: Int, swizzle_bytes: Int = 128]

Shared memory layout for Hopper SM90 matrix multiplication kernel.

This struct manages the shared memory allocation for:

  • Input tiles (A and B matrices) with multi-stage pipelining
  • Output tile (C matrix) for accumulation
  • Synchronization barriers for producer-consumer coordination

The memory is organized to support asynchronous loads and efficient bank-conflict-free access patterns for tensor core operations.

All tiles use TileTensor-based types from tile_types.mojo. At TMA/WGMMA boundaries, pass {tile.ptr} to construct LayoutTensor.

Fields​

  • ​a_tiles_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].ATileArray.Storage):
  • ​b_tiles_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].BTileArray.Storage):
  • ​c_tile_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Storage):
  • ​barriers (BarrierPair[(num_pipeline_stages // k_group_size)]):

Implemented traits​

AnyType, ImplicitlyDestructible

comptime members​

ATileArray​

comptime ATileArray = SMemTileArrayWithLayout[a_type, Layout(Coord(Coord(Idx[8](), Idx[(BM // 8)]()), Coord(Idx[(swizzle_bytes // size_of[a_type]())](), Idx[((BK * size_of[a_type]()) // swizzle_bytes)]())), Coord(Coord(Idx[(swizzle_bytes // size_of[a_type]())](), Idx[(8 * (swizzle_bytes // size_of[a_type]()))]()), Coord(Idx[1](), Idx[0 if (((BK * size_of[a_type]()) // swizzle_bytes) == 1) else (BM * (swizzle_bytes // size_of[a_type]()))]()))), num_pipeline_stages]

BTileArray​

comptime BTileArray = SMemTileArrayWithLayout[b_type, Layout(Coord(Coord(Idx[8](), Idx[(BN // 8)]()), Coord(Idx[(swizzle_bytes // size_of[b_type]())](), Idx[((BK * size_of[b_type]()) // swizzle_bytes)]())), Coord(Coord(Idx[(swizzle_bytes // size_of[b_type]())](), Idx[(8 * (swizzle_bytes // size_of[b_type]()))]()), Coord(Idx[1](), Idx[0 if (((BK * size_of[b_type]()) // swizzle_bytes) == 1) else (BN * (swizzle_bytes // size_of[b_type]()))]()))), num_pipeline_stages]

CTile​

comptime CTile = HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Tile

CTileArray​

comptime CTileArray = SMemTileArray2DRowMajor[c_type, WG_BM, WG_BN, 1]

Methods​

a_tiles​

a_tiles(ref[AddressSpace._value] self) -> HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].ATileArray

Get A tile array accessor (TileTensor-based).

Returns:

HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].ATileArray

b_tiles​

b_tiles(ref[AddressSpace._value] self) -> HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].BTileArray

Get B tile array accessor (TileTensor-based).

Returns:

HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].BTileArray

c_tile​

c_tile(ref[AddressSpace._value] self) -> HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTile

Get C tile accessor (TileTensor-based).

Returns:

HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTile

create_pipeline​

create_pipeline(ref[AddressSpace._value] self) -> ProducerConsumerPipeline[(num_pipeline_stages // k_group_size)]

Create producer-consumer pipeline from barrier storage.

Returns:

ProducerConsumerPipeline[(num_pipeline_stages // k_group_size)]

pipeline_storage_size​

static pipeline_storage_size() -> Int

Calculate the memory size for all pipeline stages.

Returns:

Int

output_storage_size​

static output_storage_size() -> Int

Calculate the memory size for output tile.

Returns:

Int

storage_size​

static storage_size() -> Int

Calculate the total storage size.

Returns:

Int