Mojo module
pipeline_storage
Unified Pipeline Storage Framework for SM100 Structured Kernels.
This module provides a single-source-of-truth framework for pipeline storage, where stage count determines barrier count, and tile storage type determines the SMEM layout for input tiles.
All tile storage uses TileTensor natively. Conversion to LayoutTensor only happens at external API boundaries (TMA, MMA) using the {ptr} syntax or explicit LayoutTensor construction.
Design Principles
- Single Source of Truth: Stage count parameterizes barrier count
- Single Source of Truth: Tile storage types define array types once
- TileTensor Native: All SMEM tiles use TileTensor
- Composable: SMEM structs compose storage objects
- Extensible: Easy to add new storage types
- Escape Hatch: Raw storage access when framework doesn't fit
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ Tile Storage (defines tile arrays and storage) │
│ │
│ StandardTileStorage[a_type, b_type, a_dim0, a_dim1, b_dim0, ...] │
│ ├── ATileArray = SMemTileArray2D[...] # TileTensor-based │
│ ├── BTileArray = SMemTileArray2D[...] # TileTensor-based │
│ ├── var a_tiles_storage │
│ ├── var b_tiles_storage │
│ └── fn a_tiles(), b_tiles() # Returns TileTensor │
│ │
│ BlockScaledTileStorage[..., sfa_type, sfb_type, dims, ...] │
│ BlockwiseFP8TileStorage[..., a_scales_type, dims, ...] │
│ OutputTileStorage[c_type, c_layout, num_stages] │
├─────────────────────────────────────────────────────────────────────┤
│ Pipeline Storage (defines barriers) │
│ │
│ InputPipelineStorage[num_stages, Payload] │
│ └── var barriers: BarrierPair[num_stages] │
│ │
│ OutputPipelineStorage[num_stages] │
│ ClcPipelineStorage[num_stages] │
│ TmemDeallocStorage │
├─────────────────────────────────────────────────────────────────────┤
│ SMEM composes both: │
│ │
│ struct MySmem: │
│ var tiles: StandardTileStorage[...] # Tile storage │
│ var output_tiles: OutputTileStorage[...] # Output tiles │
│ var input_pipeline: InputPipelineStorage[...] # Barriers │
│ var output_pipeline: OutputPipelineStorage[...] │
│ var clc_pipeline: ClcPipelineStorage[...] │
└─────────────────────────────────────────────────────────────────────┘Example Usage
struct MyKernelSmem[config: MyConfig]:
# Tile storage (single source of truth for tile types)
comptime Tiles = StandardTileStorage[
config.a_type, config.b_type,
config.BM, config.BK, # A tile dimensions
config.BN, config.BK, # B tile dimensions
config.num_pipeline_stages,
]
var tiles: Self.Tiles
# Output tile storage (separate stage count)
comptime OutputTiles = OutputTileStorage[
config.c_type, config.c_layout, config.num_output_stages
]
var output_tiles: Self.OutputTiles
# Pipeline storage (barriers)
var input_pipeline: InputPipelineStorage[...]
var output_pipeline: OutputPipelineStorage[...]
# Accessors delegate to composed storage
fn a_tiles(ref[SHARED] self) -> Self.Tiles.ATileArray:
return self.tiles.a_tiles() # Returns TileTensor
fn c_tiles(ref[SHARED] self) -> Self.OutputTiles.CTileArray:
return self.output_tiles.c_tiles()Extensibility
To add a new tile storage type:
- Create a new struct with comptime type aliases and storage fields
- Add accessors that construct tile arrays from storage
- Use in SMEM via composition
Escape Hatch
When the framework doesn't fit:
- Use raw SMemArray for custom tile layouts
- Use RawBarrierStorage for non-standard barrier patterns
- Add custom storage fields to SMEM struct
comptime values
MbarPtr
comptime MbarPtr = LegacyUnsafePointer[SharedMemBarrier, address_space=AddressSpace.SHARED]
Structs
-
BarrierPair: Storage for a producer-consumer barrier pair (full + empty). -
BlockScaledTileStorage: Storage for block-scaled matmul tiles (A, B, C, SFA, SFB). -
BlockwiseFP8TileStorage: Storage for blockwise FP8 matmul tiles (A, B, C, A-scales). -
ClcPipelineStorage: Storage for CLC (Cluster Launch Control) scheduler pipeline. -
EpiLoadPipelineStorage: Storage for epilogue load pipeline (source C loading). -
InputPipelineStorage: Unified storage for input tile pipeline (barriers + payload). -
LoadOrderBarrierStorage: Storage for load order barrier (mainloop → epilogue load coordination). -
OutputPipelineStorage: Unified storage for output/accumulator pipeline. -
OutputTileStorage: Storage for output tiles (C matrix). -
RawBarrierStorage: Escape hatch: Raw barrier storage for custom patterns. -
SmemLayouts: Common SMEM layout definitions for matmul-family kernels. -
SmemPipelineBundle: Composed pipeline storage with unified barrier accessors. -
SmemPipelineBundleNoClc: Composed pipeline storage without CLC scheduler. -
SourceTileStorage: Storage for source tensor C tiles (residual/skip connection input). -
StandardTileStorage: Storage for standard matmul tiles (A and B). -
TmemDeallocStorage: Storage for TMEM deallocation synchronization.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!