Skip to main content

Mojo module

pipeline_storage

Unified Pipeline Storage Framework for SM100 Structured Kernels.

This module provides a single-source-of-truth framework for pipeline storage, where stage count determines barrier count, and tile storage type determines the SMEM layout for input tiles.

All tile storage uses TileTensor natively. Conversion to LayoutTensor only happens at external API boundaries (TMA, MMA) using the {ptr} syntax or explicit LayoutTensor construction.

Design Principles

  1. Single Source of Truth: Stage count parameterizes barrier count
  2. Single Source of Truth: Tile storage types define array types once
  3. TileTensor Native: All SMEM tiles use TileTensor
  4. Composable: SMEM structs compose storage objects
  5. Extensible: Easy to add new storage types
  6. Escape Hatch: Raw storage access when framework doesn't fit

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│  Tile Storage (defines tile arrays and storage)                     │
│                                                                     │
│  StandardTileStorage[a_type, b_type, a_dim0, a_dim1, b_dim0, ...]  │
│      ├── ATileArray = SMemTileArray2D[...]  # TileTensor-based     │
│      ├── BTileArray = SMemTileArray2D[...]  # TileTensor-based     │
│      ├── var a_tiles_storage                                        │
│      ├── var b_tiles_storage                                        │
│      └── fn a_tiles(), b_tiles()  # Returns TileTensor             │
│                                                                     │
│  BlockScaledTileStorage[..., sfa_type, sfb_type, dims, ...]        │
│  BlockwiseFP8TileStorage[..., a_scales_type, dims, ...]            │
│  OutputTileStorage[c_type, c_layout, num_stages]                   │
├─────────────────────────────────────────────────────────────────────┤
│  Pipeline Storage (defines barriers)                                │
│                                                                     │
│  InputPipelineStorage[num_stages, Payload]                         │
│      └── var barriers: BarrierPair[num_stages]                     │
│                                                                     │
│  OutputPipelineStorage[num_stages]                                 │
│  ClcPipelineStorage[num_stages]                                    │
│  TmemDeallocStorage                                                │
├─────────────────────────────────────────────────────────────────────┤
│  SMEM composes both:                                                │
│                                                                     │
│  struct MySmem:                                                     │
│      var tiles: StandardTileStorage[...]      # Tile storage       │
│      var output_tiles: OutputTileStorage[...] # Output tiles       │
│      var input_pipeline: InputPipelineStorage[...]  # Barriers     │
│      var output_pipeline: OutputPipelineStorage[...]                │
│      var clc_pipeline: ClcPipelineStorage[...]                     │
└─────────────────────────────────────────────────────────────────────┘

Example Usage

struct MyKernelSmem[config: MyConfig]:
    # Tile storage (single source of truth for tile types)
    comptime Tiles = StandardTileStorage[
        config.a_type, config.b_type,
        config.BM, config.BK,  # A tile dimensions
        config.BN, config.BK,  # B tile dimensions
        config.num_pipeline_stages,
    ]
    var tiles: Self.Tiles

    # Output tile storage (separate stage count)
    comptime OutputTiles = OutputTileStorage[
        config.c_type, config.c_layout, config.num_output_stages
    ]
    var output_tiles: Self.OutputTiles

    # Pipeline storage (barriers)
    var input_pipeline: InputPipelineStorage[...]
    var output_pipeline: OutputPipelineStorage[...]

    # Accessors delegate to composed storage
    fn a_tiles(ref[SHARED] self) -> Self.Tiles.ATileArray:
        return self.tiles.a_tiles()  # Returns TileTensor

    fn c_tiles(ref[SHARED] self) -> Self.OutputTiles.CTileArray:
        return self.output_tiles.c_tiles()

Extensibility

To add a new tile storage type:

  1. Create a new struct with comptime type aliases and storage fields
  2. Add accessors that construct tile arrays from storage
  3. Use in SMEM via composition

Escape Hatch

When the framework doesn't fit:

  1. Use raw SMemArray for custom tile layouts
  2. Use RawBarrierStorage for non-standard barrier patterns
  3. Add custom storage fields to SMEM struct

comptime values

MbarPtr

comptime MbarPtr = LegacyUnsafePointer[SharedMemBarrier, address_space=AddressSpace.SHARED]

Structs

Was this page helpful?