Mojo module
blockwise_fp8_output_writer
Output writer for blockwise FP8 SM100 matmul.
Handles Register β SMEM β GMEM (via TMA) flow. Unlike standard matmul which reads from TMEM, blockwise FP8 accumulators are already in registers.
Supports two write modes:
- write(): TMA store for standard non-grouped matmul
- write_absolute_with_bounds_check(): Element-by-element store for 1D2D grouped matmul with expert boundary bounds checking
Structsβ
- β
BlockwiseFP8TileWriter: Write register accumulators to GMEM via SMEM and TMA.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!