IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

tile_pipeline

Tile pipeline for SM100 producer-consumer synchronization.

Provides staged tile storage with producer-consumer barrier synchronization for TMA-MMA pipeline coordination. Two complementary APIs offer either automatic scoped cleanup (context managers) or compiler-enforced explicit cleanup (linear types).

All tiles use TileTensor natively.

Key Abstractions​

  • InputTilePipeline[Payload]: Generic pipeline with payload abstraction
  • OutputTilePipeline: TMEM accumulator stages for MMAβ†’Epilogue pipeline

Type Architecture: Two-Type Design​

Input pipeline access uses a two-type design that separates compiler-enforced linear types from ergonomic context wrappers:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   InputProducerStage        β”‚   β”‚   InputConsumerStage        β”‚
β”‚   @explicit_destroy,Movable β”‚   β”‚   @explicit_destroy,Movable β”‚
β”‚   release(deinit self)      β”‚   β”‚   release(deinit self)      β”‚
β”‚   Compiler enforces release β”‚   β”‚   Compiler enforces release β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   ProducerTiles       β”‚   β”‚   ConsumerTiles       β”‚
β”‚   TrivialRegisterPassable   β”‚   β”‚   TrivialRegisterPassable   β”‚
β”‚   __enter__/__exit__        β”‚   β”‚   __enter__/__exit__        β”‚
β”‚   Auto-releases on exit     β”‚   β”‚   Auto-releases on exit     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Both type pairs share the same accessor interface (payload, stage, barrier/mbar, expect_bytes). They differ in ownership semantics:

  • Stage types (InputProducerStage, InputConsumerStage): @explicit_destroy linear types. The compiler errors if you forget to call release(). Use for flat code with explicit resource management.

  • Context types (ProducerTiles, ConsumerTiles): TrivialRegisterPassable wrappers. Auto-release via __exit__ when used in a with block. Use for scoped, automatic resource management.

Acquire API​

Role handles (InputProducer, InputConsumer) provide the primary API:

acquire()          β†’ returns Context type (for `with` blocks)
acquire_stage()    β†’ returns Stage type  (linear, compiler-enforced)
acquire_if_needed  β†’ returns Context type (try-acquire pattern)
try_acquire()      β†’ non-blocking readiness check

InputTilePipeline also exposes direct linear-type acquire methods:

acquire_producer() β†’ InputProducerStage (linear)
acquire_consumer() β†’ InputConsumerStage (linear)

Context Manager API (scoped, automatic)​

Use acquire() for automatic cleanup via with blocks. The context type's __exit__ handles barrier signaling and stage advancement:

with producer.acquire() as tiles:   # BLOCKS until consumer frees stage
    load_tiles(tiles)                # safe to write
                                     # EXIT: advances producer automatically

with consumer.acquire() as tiles:   # BLOCKS until producer fills stage
    use_tiles(tiles)                 # safe to read
                                     # EXIT: advances consumer automatically

Linear Type API (flat, compiler-enforced)​

Use acquire_stage() for explicit control with compile-time safety. The compiler errors if you forget to call release():

var tiles = producer.acquire_stage()       # BLOCKS until consumer frees stage
load_tiles(tiles)                          # safe to write
tiles^.release()                           # transfer + release (compiler enforces)

var input = pipeline.acquire_consumer()    # BLOCKS until producer fills stage
process(input)                             # safe to read
input^.release()                           # transfer + release (compiler enforces)

The ^ transfer operator is required because release(deinit self) consumes the value. Omitting it is a compile error.

When to Use Which​

  • Context manager (acquire): Default choice. Most kernel code uses this. Scoped cleanup is safe, readable, and works well with nested pipelines.
  • Linear type (acquire_stage): Use when context managers create excessive nesting, or when you need explicit control over release ordering. The compiler catches forgotten releases at compile time.

Example: TMA Load Warp (context manager)​

with input_pipeline.producer() as producer:
    for current in load_iter:
        scheduler.throttle_signal(ctx.is_first_cta_in_cluster)
        for i in range(num_iters):
            with producer.acquire() as tiles:
                tma_load(tiles.a_tile(), tiles.b_tile())
    producer.drain()

Example: MMA Warp (linear types, flat)​

var mma_handle = MmaHandle.create(...)
for _ in mma_iter:
    for _ in range(num_iters):
        var mma_stage = mma_handle.acquire_k_stage_linear()
        var input_tiles = input_pipeline.acquire_consumer()
        mma(input_tiles, mma_op, ...)
        input_tiles^.release()
        mma_stage^.release()
mma_handle^.release()

Example: Epilogue Warp (context manager)​

with epi_ctx:
    for current in epi_iter:
        with output_pipeline.consumer() as output_stage:
            write_output(output_stage)

comptime values​

MbarPtr​

comptime MbarPtr = UnsafePointer[SharedMemBarrier, MutUntrackedOrigin, address_space=AddressSpace.SHARED]

Structs​