Skip to main content

Mojo module

tile_pipeline

Tile pipeline for SM100 producer-consumer synchronization.

Provides staged tile storage with producer-consumer barrier synchronization for TMA-MMA pipeline coordination. Two complementary APIs offer either automatic scoped cleanup (context managers) or compiler-enforced explicit cleanup (linear types).

All tiles use TileTensor natively.

Key Abstractions

  • InputTilePipeline[Payload]: Generic pipeline with payload abstraction
  • OutputTilePipeline: TMEM accumulator stages for MMA→Epilogue pipeline

Type Architecture: Two-Type Design

Input pipeline access uses a two-type design that separates compiler-enforced linear types from ergonomic context wrappers:

┌─────────────────────────────┐   ┌─────────────────────────────┐
│   InputProducerStage        │   │   InputConsumerStage        │
│   @explicit_destroy,Movable │   │   @explicit_destroy,Movable │
│   release(deinit self)      │   │   release(deinit self)      │
│   Compiler enforces release │   │   Compiler enforces release │
└─────────────────────────────┘   └─────────────────────────────┘

┌─────────────────────────────┐   ┌─────────────────────────────┐
│   ProducerTiles       │   │   ConsumerTiles       │
│   TrivialRegisterPassable   │   │   TrivialRegisterPassable   │
│   __enter__/__exit__        │   │   __enter__/__exit__        │
│   Auto-releases on exit     │   │   Auto-releases on exit     │
└─────────────────────────────┘   └─────────────────────────────┘

Both type pairs share the same accessor interface (payload, stage, barrier/mbar, expect_bytes). They differ in ownership semantics:

  • Stage types (InputProducerStage, InputConsumerStage): @explicit_destroy linear types. The compiler errors if you forget to call release(). Use for flat code with explicit resource management.

  • Context types (ProducerTiles, ConsumerTiles): TrivialRegisterPassable wrappers. Auto-release via __exit__ when used in a with block. Use for scoped, automatic resource management.

Acquire API

Role handles (InputProducer, InputConsumer) provide the primary API:

acquire()          → returns Context type (for `with` blocks)
acquire_stage()    → returns Stage type  (linear, compiler-enforced)
acquire_if_needed  → returns Context type (try-acquire pattern)
try_acquire()      → non-blocking readiness check

InputTilePipeline also exposes direct linear-type acquire methods:

acquire_producer() → InputProducerStage (linear)
acquire_consumer() → InputConsumerStage (linear)

Context Manager API (scoped, automatic)

Use acquire() for automatic cleanup via with blocks. The context type's __exit__ handles barrier signaling and stage advancement:

with producer.acquire() as tiles:   # BLOCKS until consumer frees stage
    load_tiles(tiles)                # safe to write
                                     # EXIT: advances producer automatically

with consumer.acquire() as tiles:   # BLOCKS until producer fills stage
    use_tiles(tiles)                 # safe to read
                                     # EXIT: advances consumer automatically

Linear Type API (flat, compiler-enforced)

Use acquire_stage() for explicit control with compile-time safety. The compiler errors if you forget to call release():

var tiles = producer.acquire_stage()       # BLOCKS until consumer frees stage
load_tiles(tiles)                          # safe to write
tiles^.release()                           # transfer + release (compiler enforces)

var input = pipeline.acquire_consumer()    # BLOCKS until producer fills stage
process(input)                             # safe to read
input^.release()                           # transfer + release (compiler enforces)

The ^ transfer operator is required because release(deinit self) consumes the value. Omitting it is a compile error.

When to Use Which

  • Context manager (acquire): Default choice. Most kernel code uses this. Scoped cleanup is safe, readable, and works well with nested pipelines.
  • Linear type (acquire_stage): Use when context managers create excessive nesting, or when you need explicit control over release ordering. The compiler catches forgotten releases at compile time.

Example: TMA Load Warp (context manager)

with input_pipeline.producer() as producer:
    while work_iter.has_work():
        with work_iter.next() as current:
            for i in range(num_iters):
                with producer.acquire() as tiles:
                    tma_load(tiles.a_tile(), tiles.b_tile())
    producer.drain()

Example: MMA Warp (linear types, flat)

var mma_handle = MmaHandle.create(...)
while work_iter.has_work():
    with work_iter.wait_and_advance():
        for _ in range(num_iters):
            var mma_stage = mma_handle.acquire_k_stage_linear()
            var input_tiles = input_pipeline.acquire_consumer()
            mma(input_tiles, mma_op, ...)
            input_tiles^.release()
            mma_stage^.release()
mma_handle^.release()

Example: Epilogue Warp (context manager)

with epi_ctx:
    while work_iter.has_work():
        with work_iter.next() as current:
            with output_pipeline.consumer() as output_stage:
                write_output(output_stage)

comptime values

MbarPtr

comptime MbarPtr = UnsafePointer[SharedMemBarrier, MutAnyOrigin, address_space=AddressSpace.SHARED]

Structs

Was this page helpful?