Skip to main content

Mojo module

tile_writer

TileWriter components for SM100 matrix multiplication epilogue.

This module provides modular components for the output pipeline:

  1. TMAStoreWriter: TMA async store from shared memory to global memory
  2. StMatrixWriter: Register to shared memory via st.matrix instructions
  3. TMEMReader: Load accumulator data from tensor memory to registers
  4. EpilogueApplier: Apply element-wise operations on fragments

The SM100 epilogue pipeline flows as: TMEM (accumulators) → Registers → SMEM → GMEM (via TMA)

Usage: # TMA store from shared memory to global memory var tma_writer = TMAStoreWriter... tma_writer.store_tile(c_smem_tile, (n_coord, m_coord))

comptime values

RLayout32Bits

comptime RLayout32Bits[layout: Layout] = RuntimeLayout[layout, element_type=DType.uint32, linear_idx_type=DType.uint32]

Parameters

ThreadwiseStoreWriter

comptime ThreadwiseStoreWriter = TileWriterThreadwise[?, ?, ?]

TMAStoreWriter

comptime TMAStoreWriter = TileWriterTMA

Structs

Functions

Was this page helpful?