For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
pipeline_storage
Unified Pipeline Storage Framework for SM100 Structured Kernels.
This module provides a single-source-of-truth framework for pipeline storage, where stage count determines barrier count, and tile storage type determines the SMEM layout for input tiles.
All tile storage uses TileTensor natively. Conversion to LayoutTensor only happens at external API boundaries (TMA, MMA) using the {ptr} syntax or explicit LayoutTensor construction.
Design Principlesβ
- Single Source of Truth: Stage count parameterizes barrier count
- Single Source of Truth: Tile storage types define array types once
- TileTensor Native: All SMEM tiles use TileTensor
- Composable: SMEM structs compose storage objects
- Extensible: Easy to add new storage types
- Escape Hatch: Raw storage access when framework doesn't fit
Architectureβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Tile Storage (defines tile arrays and storage) β
β β
β StandardTileStorage[a_type, b_type, a_shape, b_shape, ...] β
β βββ ATileArray = SMemTileArray2D[...] # TileTensor-based β
β βββ BTileArray = SMemTileArray2D[...] # TileTensor-based β
β βββ var a_tiles_storage β
β βββ var b_tiles_storage β
β βββ def a_tiles(), b_tiles() # Returns TileTensor β
β β
β BlockScaledTileStorage[..., sfa_type, sfb_type, dims, ...] β
β BlockwiseFP8TileStorage[..., a_scales_type, dims, ...] β
β OutputTileStorage[c_type, c_layout, num_stages] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Pipeline Storage (defines barriers) β
β β
β InputPipelineStorage[num_stages, Payload] β
β βββ var barriers: BarrierPair[num_stages] β
β β
β OutputPipelineStorage[num_stages] β
β ClcPipelineStorage[num_stages] β
β TmemDeallocStorage β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β SMEM composes both: β
β β
β struct MySmem: β
β var tiles: StandardTileStorage[...] # Tile storage β
β var output_tiles: OutputTileStorage[...] # Output tiles β
β var input_pipeline: InputPipelineStorage[...] # Barriers β
β var output_pipeline: OutputPipelineStorage[...] β
β var clc_pipeline: ClcPipelineStorage[...] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββExample Usageβ
struct MyKernelSmem[config: MyConfig]:
# Tile storage (single source of truth for tile types)
comptime Tiles = StandardTileStorage[
config.a_type, config.b_type,
IndexList[2](config.BM, config.BK), # A tile shape
IndexList[2](config.BN, config.BK), # B tile shape
config.num_pipeline_stages,
]
var tiles: Self.Tiles
# Output tile storage (separate stage count)
comptime OutputTiles = OutputTileStorage[
config.c_type, config.c_layout, config.num_output_stages
]
var output_tiles: Self.OutputTiles
# Pipeline storage (barriers)
var input_pipeline: InputPipelineStorage[...]
var output_pipeline: OutputPipelineStorage[...]
# Accessors delegate to composed storage
def a_tiles(ref[SHARED] self) -> Self.Tiles.ATileArray:
return self.tiles.a_tiles() # Returns TileTensor
def c_tiles(ref[SHARED] self) -> Self.OutputTiles.CTileArray:
return self.output_tiles.c_tiles()Extensibilityβ
To add a new tile storage type:
- Create a new struct with comptime type aliases and storage fields
- Add accessors that construct tile arrays from storage
- Use in SMEM via composition
Escape Hatchβ
When the framework doesn't fit:
- Use raw SMemArray for custom tile layouts
- Use RawBarrierStorage for non-standard barrier patterns
- Add custom storage fields to SMEM struct
comptime valuesβ
MbarPtrβ
comptime MbarPtr = UnsafePointer[SharedMemBarrier, MutAnyOrigin, address_space=AddressSpace.SHARED]
Structsβ
- β
BarrierPair: Storage for a producer-consumer barrier pair (full + empty). - β
BlockScaledTileStorage: Storage for block-scaled matmul tiles (A, B, C, SFA, SFB). - β
BlockwiseFP8TileStorage: Storage for blockwise FP8 matmul tiles (A, B, C, A-scales). - β
ClcPipelineStorage: Storage for CLC (Cluster Launch Control) scheduler pipeline. - β
EpiLoadPipelineStorage: Storage for epilogue load pipeline (source C loading). - β
InputPipelineStorage: Unified storage for input tile pipeline (barriers + payload). - β
LoadOrderBarrierStorage: Storage for load order barrier (mainloop β epilogue load coordination). - β
OutputPipelineStorage: Unified storage for output/accumulator pipeline. - β
OutputTileStorage: Storage for output tiles (C matrix). - β
RawBarrierStorage: Escape hatch: Raw barrier storage for custom patterns. - β
SmemLayouts: Common SMEM layout definitions for matmul-family kernels. - β
SmemPipelineBundle: Composed pipeline storage with unified barrier accessors. - β
SmemPipelineBundleNoClc: Composed pipeline storage without CLC scheduler. - β
SourceTileStorage: Storage for source tensor C tiles (residual/skip connection input). - β
StandardTileStorage: Storage for standard matmul tiles (A and B). - β
TmemDeallocStorage: Storage for TMEM deallocation synchronization.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!