For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo package

gpu

`comptime` values

`logger`

comptime logger = Logger(stdout, prefix=String(""), source_location=False)

Packages

amd: Provides the AMD GPU backend implementations for matmuls.
amd_rdna: Provides the AMD RDNA GPU backend implementations for matmuls.
apple: Provides the Apple silicon GPU backend implementations for matmuls.
sm100: Provides the Nvidia Blackwell backend implementations for matmuls.
sm100_structured: SM100 Structured Kernels - Blackwell matmul implementation.
sm80: Provides the CPU Hopper backend implementations for matmuls.
sm90: Provides the Nvidia Hopper backend implementations for matmuls.

Modules

Functions

matmul_kernel: Matrix Multiplication using shared memory. This version loads blocks of size tile_size x tile_size from A and B and updates a tile_size x tile_size in C. The thread block should have shape (tile_size, tile_size, 1). Each thread is mapped one element in C. The grid should have shape (N/tile_size, M/tile_size, 1). N is the first dimension for coalesced access.
matmul_kernel_naive:
multistage_gemm: TileTensor overload of multistage_gemm. Converts to LayoutTensor and dispatches to the appropriate GEMM kernel.
split_k_reduce:

comptime values​

logger​

Packages​

Modules​

Functions​

`comptime` values

`logger`

Packages

Modules

Functions