Skip to main content

Mojo module

block_scaled_output_writer

BlockScaledTileWriter for SM100 block-scaled matmul output pipeline.

Writes accumulated results from TMEM → Registers → SMEM → GMEM (via TMA). Uses 3D coordinates (M, N, Batch) for batched block-scaled matmul.

Uses structured building blocks from tile_writer.mojo:

  • TmemArrayType / load_fragments() for TMEM load
  • AccumBarrier.arrive() for barrier signaling
  • TMEMToSMemWriter.write_fragments() for SMEM write
  • TMAStoreExecutor with batched=True for 3D TMA stores
  • tma_wait_pipelined() for TMA wait

Usage: var writer = BlockScaledTileWriter... writer.write(c_tiles, pipeline, stage, tmem_offset, coord, shape)

Structs

Was this page helpful?