Skip to main content

Mojo module

tile_writer

TileWriter module for efficient tile writing in GPU matrix multiplication.

This module provides utilities for writing tiles to memory using different mechanisms and destinations:

  1. Register → Shared Memory: Uses st.matrix hardware instruction for efficient storage of WGMMA accumulator results to shared memory with swizzling.

  2. Register → Global Memory: Direct stores from register tiles to global memory with optional epilogue processing and bounds checking.

  3. Shared Memory → Global Memory: Hardware-accelerated TMA stores or regular stores for efficient 2D tile transfers from shared to global memory.

Two main traits abstract these writing mechanisms:

  • TileWriter: For shared memory → global memory transfers
  • RegTileWriter: For register → memory (shared or global) transfers

Structs

Traits

  • RegTileWriter: Base trait for tile writing mechanisms in matrix multiplication.
  • SMemTileWriter: Base trait for tile writing mechanisms in matrix multiplication.

Was this page helpful?