Skip to main content

Mojo module

tile_pipeline

Tile pipeline for SM100 producer-consumer synchronization.

Provides staged tile storage with producer-consumer barrier synchronization for TMA-MMA pipeline coordination. Two complementary APIs offer either automatic scoped cleanup (context managers) or compiler-enforced explicit cleanup (linear types).

All tiles use TileTensor natively.

Key Abstractions​

  • InputTilePipeline[Payload]: Generic pipeline with payload abstraction
  • OutputTilePipeline: TMEM accumulator stages for MMAβ†’Epilogue pipeline

Type Architecture: Two-Type Design​

Input pipeline access uses a two-type design that separates compiler-enforced linear types from ergonomic context wrappers:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   InputProducerStage        β”‚   β”‚   InputConsumerStage        β”‚
β”‚   @explicit_destroy,Movable β”‚   β”‚   @explicit_destroy,Movable β”‚
β”‚   release(deinit self)      β”‚   β”‚   release(deinit self)      β”‚
β”‚   Compiler enforces release β”‚   β”‚   Compiler enforces release β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   ProducerTiles       β”‚   β”‚   ConsumerTiles       β”‚
β”‚   TrivialRegisterPassable   β”‚   β”‚   TrivialRegisterPassable   β”‚
β”‚   __enter__/__exit__        β”‚   β”‚   __enter__/__exit__        β”‚
β”‚   Auto-releases on exit     β”‚   β”‚   Auto-releases on exit     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Both type pairs share the same accessor interface (payload, stage, barrier/mbar, expect_bytes). They differ in ownership semantics:

  • Stage types (InputProducerStage, InputConsumerStage): @explicit_destroy linear types. The compiler errors if you forget to call release(). Use for flat code with explicit resource management.

  • Context types (ProducerTiles, ConsumerTiles): TrivialRegisterPassable wrappers. Auto-release via __exit__ when used in a with block. Use for scoped, automatic resource management.

Acquire API​

Role handles (InputProducer, InputConsumer) provide the primary API:

acquire()          β†’ returns Context type (for `with` blocks)
acquire_stage()    β†’ returns Stage type  (linear, compiler-enforced)
acquire_if_needed  β†’ returns Context type (try-acquire pattern)
try_acquire()      β†’ non-blocking readiness check

InputTilePipeline also exposes direct linear-type acquire methods:

acquire_producer() β†’ InputProducerStage (linear)
acquire_consumer() β†’ InputConsumerStage (linear)

Context Manager API (scoped, automatic)​

Use acquire() for automatic cleanup via with blocks. The context type's __exit__ handles barrier signaling and stage advancement:

with producer.acquire() as tiles:   # BLOCKS until consumer frees stage
    load_tiles(tiles)                # safe to write
                                     # EXIT: advances producer automatically

with consumer.acquire() as tiles:   # BLOCKS until producer fills stage
    use_tiles(tiles)                 # safe to read
                                     # EXIT: advances consumer automatically

Linear Type API (flat, compiler-enforced)​

Use acquire_stage() for explicit control with compile-time safety. The compiler errors if you forget to call release():

var tiles = producer.acquire_stage()       # BLOCKS until consumer frees stage
load_tiles(tiles)                          # safe to write
tiles^.release()                           # transfer + release (compiler enforces)

var input = pipeline.acquire_consumer()    # BLOCKS until producer fills stage
process(input)                             # safe to read
input^.release()                           # transfer + release (compiler enforces)

The ^ transfer operator is required because release(deinit self) consumes the value. Omitting it is a compile error.

When to Use Which​

  • Context manager (acquire): Default choice. Most kernel code uses this. Scoped cleanup is safe, readable, and works well with nested pipelines.
  • Linear type (acquire_stage): Use when context managers create excessive nesting, or when you need explicit control over release ordering. The compiler catches forgotten releases at compile time.

Example: TMA Load Warp (context manager)​

with input_pipeline.producer() as producer:
    for current in load_iter:
        scheduler.throttle_signal(ctx.is_first_cta_in_cluster)
        for i in range(num_iters):
            with producer.acquire() as tiles:
                tma_load(tiles.a_tile(), tiles.b_tile())
    producer.drain()

Example: MMA Warp (linear types, flat)​

var mma_handle = MmaHandle.create(...)
for _ in mma_iter:
    for _ in range(num_iters):
        var mma_stage = mma_handle.acquire_k_stage_linear()
        var input_tiles = input_pipeline.acquire_consumer()
        mma(input_tiles, mma_op, ...)
        input_tiles^.release()
        mma_stage^.release()
mma_handle^.release()

Example: Epilogue Warp (context manager)​

with epi_ctx:
    for current in epi_iter:
        with output_pipeline.consumer() as output_stage:
            write_output(output_stage)

comptime values​

MbarPtr​

comptime MbarPtr = UnsafePointer[SharedMemBarrier, MutAnyOrigin, address_space=AddressSpace.SHARED]

Structs​