Skip to main content

Mojo struct

HopperMatmulSM90Kernel_SMem

struct HopperMatmulSM90Kernel_SMem[a_type: DType, b_type: DType, c_type: DType, BM: Int, BN: Int, BK: Int, WG_BM: Int, WG_BN: Int, num_pipeline_stages: Int, k_group_size: Int, swizzle_bytes: Int = 128]

Shared memory layout for Hopper SM90 matrix multiplication kernel.

This struct manages the shared memory allocation for:

  • Input tiles (A and B matrices) with multi-stage pipelining
  • Output tile (C matrix) for accumulation
  • Synchronization barriers for producer-consumer coordination

The memory is organized to support asynchronous loads and efficient bank-conflict-free access patterns for tensor core operations.

All tiles use TileTensor-based types from tile_types.mojo. At TMA/WGMMA boundaries, pass {tile.ptr} to construct LayoutTensor.

Fields

  • a_tiles_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].ATileArray.Storage):
  • b_tiles_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].BTileArray.Storage):
  • c_tile_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Storage):
  • barriers (BarrierPair[(num_pipeline_stages // k_group_size)]):

Implemented traits

AnyType, ImplicitlyDestructible

comptime members

__del__is_trivial

comptime __del__is_trivial = True

ATileArray

comptime ATileArray = SMemTileArrayWithLayout[a_type, Layout(Coord(VariadicPack(Coord(VariadicPack(Idx[8](), Idx[(BM // 8)]())), Coord(VariadicPack(Idx[(swizzle_bytes // size_of[a_type]())](), Idx[((BK * size_of[a_type]()) // swizzle_bytes)]())))), Coord(VariadicPack(Coord(VariadicPack(Idx[(swizzle_bytes // size_of[a_type]())](), Idx[(8 * (swizzle_bytes // size_of[a_type]()))]())), Coord(VariadicPack(Idx[1](), Idx[0 if (((BK * size_of[a_type]()) // swizzle_bytes) == 1)._mlir_value else (BM * (swizzle_bytes // size_of[a_type]()))]()))))), num_pipeline_stages]

BTileArray

comptime BTileArray = SMemTileArrayWithLayout[b_type, Layout(Coord(VariadicPack(Coord(VariadicPack(Idx[8](), Idx[(BN // 8)]())), Coord(VariadicPack(Idx[(swizzle_bytes // size_of[b_type]())](), Idx[((BK * size_of[b_type]()) // swizzle_bytes)]())))), Coord(VariadicPack(Coord(VariadicPack(Idx[(swizzle_bytes // size_of[b_type]())](), Idx[(8 * (swizzle_bytes // size_of[b_type]()))]())), Coord(VariadicPack(Idx[1](), Idx[0 if (((BK * size_of[b_type]()) // swizzle_bytes) == 1)._mlir_value else (BN * (swizzle_bytes // size_of[b_type]()))]()))))), num_pipeline_stages]

CTile

comptime CTile = HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Tile

CTileArray

comptime CTileArray = SMemTileArray2DRowMajor[c_type, WG_BM, WG_BN, 1]

Methods

a_tiles

a_tiles(ref[AddressSpace._value._mlir_value] self) -> HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].ATileArray

Get A tile array accessor (TileTensor-based).

Returns:

HopperMatmulSM90Kernel_SMem

b_tiles

b_tiles(ref[AddressSpace._value._mlir_value] self) -> HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].BTileArray

Get B tile array accessor (TileTensor-based).

Returns:

HopperMatmulSM90Kernel_SMem

c_tile

c_tile(ref[AddressSpace._value._mlir_value] self) -> HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTile

Get C tile accessor (TileTensor-based).

Returns:

HopperMatmulSM90Kernel_SMem

create_pipeline

create_pipeline(ref[AddressSpace._value._mlir_value] self) -> ProducerConsumerPipeline[(num_pipeline_stages // k_group_size)]

Create producer-consumer pipeline from barrier storage.

Returns:

ProducerConsumerPipeline

pipeline_storage_size

static pipeline_storage_size() -> Int

Calculate the memory size for all pipeline stages.

Returns:

Int

output_storage_size

static output_storage_size() -> Int

Calculate the memory size for output tile.

Returns:

Int

storage_size

static storage_size() -> Int

Calculate the total storage size.

Returns:

Int

Was this page helpful?