Mojo module
tile_pipeline
Tile pipeline for SM100 producer-consumer synchronization.
Provides staged tile storage with producer-consumer barrier synchronization for TMA-MMA pipeline coordination. Two complementary APIs offer either automatic scoped cleanup (context managers) or compiler-enforced explicit cleanup (linear types).
All tiles use TileTensor natively.
Key Abstractionsβ
- InputTilePipeline[Payload]: Generic pipeline with payload abstraction
- OutputTilePipeline: TMEM accumulator stages for MMAβEpilogue pipeline
Type Architecture: Two-Type Designβ
Input pipeline access uses a two-type design that separates compiler-enforced linear types from ergonomic context wrappers:
βββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β InputProducerStage β β InputConsumerStage β
β @explicit_destroy,Movable β β @explicit_destroy,Movable β
β release(deinit self) β β release(deinit self) β
β Compiler enforces release β β Compiler enforces release β
βββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β ProducerTiles β β ConsumerTiles β
β TrivialRegisterPassable β β TrivialRegisterPassable β
β __enter__/__exit__ β β __enter__/__exit__ β
β Auto-releases on exit β β Auto-releases on exit β
βββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββBoth type pairs share the same accessor interface (payload, stage, barrier/mbar, expect_bytes). They differ in ownership semantics:
-
Stage types (
InputProducerStage,InputConsumerStage):@explicit_destroylinear types. The compiler errors if you forget to callrelease(). Use for flat code with explicit resource management. -
Context types (
ProducerTiles,ConsumerTiles):TrivialRegisterPassablewrappers. Auto-release via__exit__when used in awithblock. Use for scoped, automatic resource management.
Acquire APIβ
Role handles (InputProducer, InputConsumer) provide the primary API:
acquire() β returns Context type (for `with` blocks)
acquire_stage() β returns Stage type (linear, compiler-enforced)
acquire_if_needed β returns Context type (try-acquire pattern)
try_acquire() β non-blocking readiness checkInputTilePipeline also exposes direct linear-type acquire methods:
acquire_producer() β InputProducerStage (linear)
acquire_consumer() β InputConsumerStage (linear)Context Manager API (scoped, automatic)β
Use acquire() for automatic cleanup via with blocks. The context
type's __exit__ handles barrier signaling and stage advancement:
with producer.acquire() as tiles: # BLOCKS until consumer frees stage
load_tiles(tiles) # safe to write
# EXIT: advances producer automatically
with consumer.acquire() as tiles: # BLOCKS until producer fills stage
use_tiles(tiles) # safe to read
# EXIT: advances consumer automaticallyLinear Type API (flat, compiler-enforced)β
Use acquire_stage() for explicit control with compile-time safety.
The compiler errors if you forget to call release():
var tiles = producer.acquire_stage() # BLOCKS until consumer frees stage
load_tiles(tiles) # safe to write
tiles^.release() # transfer + release (compiler enforces)
var input = pipeline.acquire_consumer() # BLOCKS until producer fills stage
process(input) # safe to read
input^.release() # transfer + release (compiler enforces)The ^ transfer operator is required because release(deinit self)
consumes the value. Omitting it is a compile error.
When to Use Whichβ
- Context manager (
acquire): Default choice. Most kernel code uses this. Scoped cleanup is safe, readable, and works well with nested pipelines. - Linear type (
acquire_stage): Use when context managers create excessive nesting, or when you need explicit control over release ordering. The compiler catches forgotten releases at compile time.
Example: TMA Load Warp (context manager)β
with input_pipeline.producer() as producer:
for current in load_iter:
scheduler.throttle_signal(ctx.is_first_cta_in_cluster)
for i in range(num_iters):
with producer.acquire() as tiles:
tma_load(tiles.a_tile(), tiles.b_tile())
producer.drain()Example: MMA Warp (linear types, flat)β
var mma_handle = MmaHandle.create(...)
for _ in mma_iter:
for _ in range(num_iters):
var mma_stage = mma_handle.acquire_k_stage_linear()
var input_tiles = input_pipeline.acquire_consumer()
mma(input_tiles, mma_op, ...)
input_tiles^.release()
mma_stage^.release()
mma_handle^.release()Example: Epilogue Warp (context manager)β
with epi_ctx:
for current in epi_iter:
with output_pipeline.consumer() as output_stage:
write_output(output_stage)comptime valuesβ
MbarPtrβ
comptime MbarPtr = UnsafePointer[SharedMemBarrier, MutAnyOrigin, address_space=AddressSpace.SHARED]
Structsβ
- β
BlockScaledTilePayload: Tile payload for block-scaled matmul (A, B, SFA, SFB tiles). - β
BlockwiseFP8TilePayload: Tile payload for blockwise FP8 matmul (A, B, A-scales tiles). - β
ConsumerTiles: Context manager for consuming one input pipeline stage. - β
EpilogueKContext: Per-K context manager for epilogue warp in blockwise FP8. - β
EpilogueKStage: Per-K stage for epilogue warp in blockwise FP8. - β
EpilogueStage: Unified linear type handle for epilogue stage in output pipeline. - β
InputConsumer: Consumer view for MMA warp. Use acquire() to get stages. - β
InputConsumerStage: Linear type handle for consumer tile access. - β
InputProducer: Producer view for TMA Load warp. Use acquire() to get stages. - β
InputProducerStage: Linear type handle for producer tile access. - β
InputTilePipeline: Tile pipeline with configurable payload type. - β
MmaKStage: Per-K stage context for MMA warp in blockwise FP8. - β
MmaStage: Unified linear type handle for MMA stage in output pipeline. - β
OutputConsumer: Consumer view for epilogue warp (output pipeline). - β
OutputKPipeline: Per-K-iteration view of OutputTilePipeline. - β
OutputProducer: Producer view for MMA warp (output pipeline). - β
OutputStage: Acquired output stage with TMEM handle and pipeline reference. - β
OutputTilePipeline: Pipeline for MMAβEpilogue TMEM stage synchronization. - β
PerKConsumerStage: Context manager for per-K epilogue consumption. - β
ProducerTiles: Context manager for producing one input pipeline stage. - β
StandardTilePayload: Tile payload for standard matmul (A and B tiles).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!