Skip to main content

Mojo module

tile_writer

TileWriter module for efficient tile writing in GPU matrix multiplication.

This module provides utilities for writing tiles to memory using different mechanisms and destinations:

  1. Register β†’ Shared Memory: Uses st.matrix hardware instruction for efficient storage of WGMMA accumulator results to shared memory with swizzling.

  2. Register β†’ Global Memory: Direct stores from register tiles to global memory with optional epilogue processing and bounds checking.

  3. Shared Memory β†’ Global Memory: Hardware-accelerated TMA stores or regular stores for efficient 2D tile transfers from shared to global memory.

Two main traits abstract these writing mechanisms:

  • TileWriter: For shared memory β†’ global memory transfers
  • RegTileWriter: For register β†’ memory (shared or global) transfers

Structs​

Traits​