IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo package

gpu

comptime values​

logger​

comptime logger = Logger(stdout, prefix=String(""), source_location=False)

Packages​

  • ​amd: Provides the AMD GPU backend implementations for matmuls.
  • ​amd_rdna: Provides the AMD RDNA GPU backend implementations for matmuls.
  • ​apple: Provides the Apple silicon GPU backend implementations for matmuls.
  • ​sm100: Provides the Nvidia Blackwell backend implementations for matmuls.
  • ​sm100_structured: SM100 Structured Kernels - Blackwell matmul implementation.
  • ​sm80: Provides the CPU Hopper backend implementations for matmuls.
  • ​sm90: Provides the Nvidia Hopper backend implementations for matmuls.

Modules​

Functions​

  • ​matmul_kernel: Matrix Multiplication using shared memory. This version loads blocks of size tile_size x tile_size from A and B and updates a tile_size x tile_size in C. The thread block should have shape (tile_size, tile_size, 1). Each thread is mapped one element in C. The grid should have shape (N/tile_size, M/tile_size, 1). N is the first dimension for coalesced access.
  • ​matmul_kernel_naive:
  • ​multistage_gemm: TileTensor overload of multistage_gemm. Converts to LayoutTensor and dispatches to the appropriate GEMM kernel.
  • ​split_k_reduce: