Skip to main content

Mojo module

matmul_output

SM100 Matmul Output Pipeline - TMEM → SMEM → GMEM epilogue.

This module contains the output pipeline code for SM100 matmul:

  • copy_accum_to_gmem: Core epilogue pipeline (TMEM → Registers → SMEM → GMEM)
  • multi_stage_store_C: Output pipeline orchestration for standard matmul
  • multi_stage_store_C_split_k: Output pipeline for split-K matmul

The output pipeline handles:

  • Loading accumulated results from Tensor Memory (TMEM)
  • Applying optional epilogue operations (bias, activation)
  • Writing to shared memory via st.matrix instructions
  • Transferring to global memory via TMA async stores

Functions

Was this page helpful?