IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

HopperMatmulSM90Kernel_SMem

struct HopperMatmulSM90Kernel_SMem[a_type: DType, b_type: DType, c_type: DType, BM: Int, BN: Int, BK: Int, WG_BM: Int, WG_BN: Int, num_pipeline_stages: Int, k_group_size: Int, swizzle_bytes: Int = Int(128)]

Shared memory layout for Hopper SM90 matrix multiplication kernel.

This struct manages the shared memory allocation for:

  • Input tiles (A and B matrices) with multi-stage pipelining
  • Output tile (C matrix) for accumulation
  • Synchronization barriers for producer-consumer coordination

The memory is organized to support asynchronous loads and efficient bank-conflict-free access patterns for tensor core operations.

All tiles use TileTensor-based types from tile_types.mojo. At TMA/WGMMA boundaries, pass {tile.ptr} to construct the tile view.

Fields​

  • ​a_tiles_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].ATileArray.Storage):
  • ​b_tiles_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].BTileArray.Storage):
  • ​c_tile_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Storage):
  • ​barriers (BarrierPair[(num_pipeline_stages // k_group_size)]):

Implemented traits​

AnyType, ImplicitlyDeletable

comptime members​

ATileArray​

comptime ATileArray = SMemTileArrayWithLayout[a_type, Layout(Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt())), Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt()))), num_pipeline_stages]

BTileArray​

comptime BTileArray = SMemTileArrayWithLayout[b_type, Layout(Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt())), Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt()))), num_pipeline_stages]

CTile​

comptime CTile = HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Tile

CTileArray​

comptime CTileArray = SMemTileArray2DRowMajor[c_type, WG_BM, WG_BN, Int(1)]

Methods​

a_tiles​

def a_tiles(ref[AddressSpace._value] self) -> Self.ATileArray

Get A tile array accessor (TileTensor-based).

Returns:

Self.ATileArray

b_tiles​

def b_tiles(ref[AddressSpace._value] self) -> Self.BTileArray

Get B tile array accessor (TileTensor-based).

Returns:

Self.BTileArray

c_tile​

def c_tile(ref[AddressSpace._value] self) -> Self.CTile

Get C tile accessor (TileTensor-based).

Returns:

Self.CTile

create_pipeline​

def create_pipeline(ref[AddressSpace._value] self) -> ProducerConsumerPipeline[(num_pipeline_stages // k_group_size)]

Create producer-consumer pipeline from barrier storage.

Returns:

ProducerConsumerPipeline[(num_pipeline_stages // k_group_size)]

pipeline_storage_size​

static def pipeline_storage_size() -> Int

Calculate the memory size for all pipeline stages.

Returns:

Int

output_storage_size​

static def output_storage_size() -> Int

Calculate the memory size for output tile.

Returns:

Int

storage_size​

static def storage_size() -> Int

Calculate the total storage size.

Returns:

Int