Mojo module
tile_pipeline
Tile pipeline for SM100 producer-consumer synchronization.
Provides staged tile storage with producer-consumer barrier synchronization for TMA-MMA pipeline coordination. Two complementary APIs offer either automatic scoped cleanup (context managers) or compiler-enforced explicit cleanup (linear types).
All tiles use TileTensor natively.
Key Abstractions
- InputTilePipeline[Payload]: Generic pipeline with payload abstraction
- OutputTilePipeline: TMEM accumulator stages for MMA→Epilogue pipeline
Type Architecture: Two-Type Design
Input pipeline access uses a two-type design that separates compiler-enforced linear types from ergonomic context wrappers:
┌─────────────────────────────┐ ┌─────────────────────────────┐
│ InputProducerStage │ │ InputConsumerStage │
│ @explicit_destroy,Movable │ │ @explicit_destroy,Movable │
│ release(deinit self) │ │ release(deinit self) │
│ Compiler enforces release │ │ Compiler enforces release │
└─────────────────────────────┘ └─────────────────────────────┘
┌─────────────────────────────┐ ┌─────────────────────────────┐
│ ProducerTiles │ │ ConsumerTiles │
│ TrivialRegisterPassable │ │ TrivialRegisterPassable │
│ __enter__/__exit__ │ │ __enter__/__exit__ │
│ Auto-releases on exit │ │ Auto-releases on exit │
└─────────────────────────────┘ └─────────────────────────────┘Both type pairs share the same accessor interface (payload, stage, barrier/mbar, expect_bytes). They differ in ownership semantics:
-
Stage types (
InputProducerStage,InputConsumerStage):@explicit_destroylinear types. The compiler errors if you forget to callrelease(). Use for flat code with explicit resource management. -
Context types (
ProducerTiles,ConsumerTiles):TrivialRegisterPassablewrappers. Auto-release via__exit__when used in awithblock. Use for scoped, automatic resource management.
Acquire API
Role handles (InputProducer, InputConsumer) provide the primary API:
acquire() → returns Context type (for `with` blocks)
acquire_stage() → returns Stage type (linear, compiler-enforced)
acquire_if_needed → returns Context type (try-acquire pattern)
try_acquire() → non-blocking readiness checkInputTilePipeline also exposes direct linear-type acquire methods:
acquire_producer() → InputProducerStage (linear)
acquire_consumer() → InputConsumerStage (linear)Context Manager API (scoped, automatic)
Use acquire() for automatic cleanup via with blocks. The context
type's __exit__ handles barrier signaling and stage advancement:
with producer.acquire() as tiles: # BLOCKS until consumer frees stage
load_tiles(tiles) # safe to write
# EXIT: advances producer automatically
with consumer.acquire() as tiles: # BLOCKS until producer fills stage
use_tiles(tiles) # safe to read
# EXIT: advances consumer automaticallyLinear Type API (flat, compiler-enforced)
Use acquire_stage() for explicit control with compile-time safety.
The compiler errors if you forget to call release():
var tiles = producer.acquire_stage() # BLOCKS until consumer frees stage
load_tiles(tiles) # safe to write
tiles^.release() # transfer + release (compiler enforces)
var input = pipeline.acquire_consumer() # BLOCKS until producer fills stage
process(input) # safe to read
input^.release() # transfer + release (compiler enforces)The ^ transfer operator is required because release(deinit self)
consumes the value. Omitting it is a compile error.
When to Use Which
- Context manager (
acquire): Default choice. Most kernel code uses this. Scoped cleanup is safe, readable, and works well with nested pipelines. - Linear type (
acquire_stage): Use when context managers create excessive nesting, or when you need explicit control over release ordering. The compiler catches forgotten releases at compile time.
Example: TMA Load Warp (context manager)
with input_pipeline.producer() as producer:
while work_iter.has_work():
with work_iter.next() as current:
for i in range(num_iters):
with producer.acquire() as tiles:
tma_load(tiles.a_tile(), tiles.b_tile())
producer.drain()Example: MMA Warp (linear types, flat)
var mma_handle = MmaHandle.create(...)
while work_iter.has_work():
with work_iter.wait_and_advance():
for _ in range(num_iters):
var mma_stage = mma_handle.acquire_k_stage_linear()
var input_tiles = input_pipeline.acquire_consumer()
mma(input_tiles, mma_op, ...)
input_tiles^.release()
mma_stage^.release()
mma_handle^.release()Example: Epilogue Warp (context manager)
with epi_ctx:
while work_iter.has_work():
with work_iter.next() as current:
with output_pipeline.consumer() as output_stage:
write_output(output_stage)comptime values
MbarPtr
comptime MbarPtr = UnsafePointer[SharedMemBarrier, MutAnyOrigin, address_space=AddressSpace.SHARED]
Structs
-
BlockScaledTilePayload: Tile payload for block-scaled matmul (A, B, SFA, SFB tiles). -
BlockwiseFP8TilePayload: Tile payload for blockwise FP8 matmul (A, B, A-scales tiles). -
ConsumerTiles: Context manager for consuming one input pipeline stage. -
EpilogueKContext: Per-K context manager for epilogue warp in blockwise FP8. -
EpilogueKStage: Per-K stage for epilogue warp in blockwise FP8. -
EpilogueStage: Unified linear type handle for epilogue stage in output pipeline. -
InputConsumer: Consumer view for MMA warp. Use acquire() to get stages. -
InputConsumerStage: Linear type handle for consumer tile access. -
InputProducer: Producer view for TMA Load warp. Use acquire() to get stages. -
InputProducerStage: Linear type handle for producer tile access. -
InputTilePipeline: Tile pipeline with configurable payload type. -
MmaKStage: Per-K stage context for MMA warp in blockwise FP8. -
MmaStage: Unified linear type handle for MMA stage in output pipeline. -
OutputConsumer: Consumer view for epilogue warp (output pipeline). -
OutputKPipeline: Per-K-iteration view of OutputTilePipeline. -
OutputProducer: Producer view for MMA warp (output pipeline). -
OutputStage: Acquired output stage with TMEM handle and pipeline reference. -
OutputTilePipeline: Pipeline for MMA→Epilogue TMEM stage synchronization. -
PerKConsumerStage: Context manager for per-K epilogue consumption. -
ProducerTiles: Context manager for producing one input pipeline stage. -
StandardTilePayload: Tile payload for standard matmul (A and B tiles).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!