IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

tile_writer

TileWriter module for efficient tile writing in GPU matrix multiplication.

This module provides utilities for writing tiles to memory using different mechanisms and destinations:

  1. Register β†’ Shared Memory: Uses st.matrix hardware instruction for efficient storage of WGMMA accumulator results to shared memory with swizzling.

  2. Register β†’ Global Memory: Direct stores from register tiles to global memory with optional epilogue processing and bounds checking.

  3. Shared Memory β†’ Global Memory: Hardware-accelerated TMA stores or regular stores for efficient 2D tile transfers from shared to global memory.

Two main traits abstract these writing mechanisms:

  • TileWriter: For shared memory β†’ global memory transfers
  • RegTileWriter: For register β†’ memory (shared or global) transfers

Structs​

Traits​