Mojo module
block_scaled_output_writer
BlockScaledTileWriter for SM100 block-scaled matmul output pipeline.
Writes accumulated results from TMEM → Registers → SMEM → GMEM (via TMA). Uses 3D coordinates (M, N, Batch) for batched block-scaled matmul.
Uses structured building blocks from tile_writer.mojo:
- TmemArrayType / load_fragments() for TMEM load
- AccumBarrier.arrive() for barrier signaling
- TMEMToSMemWriter.write_fragments() for SMEM write
- TMAStoreExecutor with batched=True for 3D TMA stores
- tma_wait_pipelined() for TMA wait
Usage: var writer = BlockScaledTileWriter... writer.write(c_tiles, pipeline, stage, tmem_offset, coord, shape)
Structs
-
BlockScaledTileWriter: Output tile writer for SM100 block-scaled matmul epilogue.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!