For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

HopperMatmulSM90Kernel_SMem

struct HopperMatmulSM90Kernel_SMem[a_type: DType, b_type: DType, c_type: DType, BM: Int, BN: Int, BK: Int, WG_BM: Int, WG_BN: Int, num_pipeline_stages: Int, k_group_size: Int, swizzle_bytes: Int = 128]

Shared memory layout for Hopper SM90 matrix multiplication kernel.

This struct manages the shared memory allocation for:

Input tiles (A and B matrices) with multi-stage pipelining
Output tile (C matrix) for accumulation
Synchronization barriers for producer-consumer coordination

The memory is organized to support asynchronous loads and efficient bank-conflict-free access patterns for tensor core operations.

All tiles use TileTensor-based types from tile_types.mojo. At TMA/WGMMA boundaries, pass {tile.ptr} to construct the tile view.

Fields

a_tiles_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].ATileArray.Storage):
b_tiles_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].BTileArray.Storage):
c_tile_storage (HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Storage):
barriers (BarrierPair[(num_pipeline_stages // k_group_size)]):

Implemented traits

AnyType, ImplicitlyDeletable

`comptime` members

`ATileArray`

comptime ATileArray = SMemTileArrayWithLayout[a_type, Layout(Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt())), Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt()))), num_pipeline_stages]

`BTileArray`

comptime BTileArray = SMemTileArrayWithLayout[b_type, Layout(Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt())), Coord(Coord(ComptimeInt(), ComptimeInt()), Coord(ComptimeInt(), ComptimeInt()))), num_pipeline_stages]

`CTile`

comptime CTile = HopperMatmulSM90Kernel_SMem[a_type, b_type, c_type, BM, BN, BK, WG_BM, WG_BN, num_pipeline_stages, k_group_size, swizzle_bytes].CTileArray.Tile

`CTileArray`

comptime CTileArray = SMemTileArray2DRowMajor[c_type, WG_BM, WG_BN, 1]

Methods

`a_tiles`

def a_tiles(ref[AddressSpace._value] self) -> Self.ATileArray

Get A tile array accessor (TileTensor-based).

Returns:

Self.ATileArray

`b_tiles`

def b_tiles(ref[AddressSpace._value] self) -> Self.BTileArray

Get B tile array accessor (TileTensor-based).

Returns:

Self.BTileArray

`c_tile`

def c_tile(ref[AddressSpace._value] self) -> Self.CTile

Get C tile accessor (TileTensor-based).

Returns:

Self.CTile

`create_pipeline`

def create_pipeline(ref[AddressSpace._value] self) -> ProducerConsumerPipeline[(num_pipeline_stages // k_group_size)]

Create producer-consumer pipeline from barrier storage.

Returns:

ProducerConsumerPipeline[(num_pipeline_stages // k_group_size)]

`pipeline_storage_size`

static def pipeline_storage_size() -> Int

Calculate the memory size for all pipeline stages.

Returns:

Int

`output_storage_size`

static def output_storage_size() -> Int

Calculate the memory size for output tile.

Returns:

Int

`storage_size`

static def storage_size() -> Int

Calculate the total storage size.

Returns:

Int

Fields​

Implemented traits​

comptime members​

ATileArray​

BTileArray​

CTile​

CTileArray​

Methods​

a_tiles​

b_tiles​

c_tile​

create_pipeline​

pipeline_storage_size​

output_storage_size​

storage_size​

Fields

Implemented traits

`comptime` members

`ATileArray`

`BTileArray`

`CTile`

`CTileArray`

Methods

`a_tiles`

`b_tiles`

`c_tile`

`create_pipeline`

`pipeline_storage_size`

`output_storage_size`

`storage_size`