IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

tile_loader

TileLoader module for efficient tile loading in GPU matrix multiplication.

This module provides utilities for loading matrix tiles from global memory to shared memory using two different mechanisms:

  1. TMA (Tensor Memory Accelerator): Hardware-accelerated loads that can efficiently transfer 2D tiles with multicast support for multi-block clusters.

  2. cp.async: Software-based asynchronous copy instructions with manual bounds checking and swizzling for optimal shared memory access patterns.

The TileLoader struct abstracts these loading mechanisms to provide a unified interface for the matmul kernel's producer threads.

Structs​

Traits​

Functions​