Skip to main content

Mojo module

tile_writer

TileWriter module for efficient tile writing in GPU matrix multiplication.

This module provides utilities for writing matrix tiles from shared memory to global memory using two different mechanisms:

  1. TMA (Tensor Memory Accelerator): Hardware-accelerated stores for efficient 2D tile transfers from shared to global memory.

  2. Regular stores: Software-based synchronous stores with manual thread distribution and swizzling for optimal memory access patterns.

The TileWriter trait abstracts these writing mechanisms to provide a unified interface for the matmul kernel's consumer threads.

Structs

Traits

  • TileWriter: Base trait for tile writing mechanisms in matrix multiplication.

Was this page helpful?