Mojo package
gpu
Packages
-
amd: Provides the AMD GPU backend implementations for matmuls. -
sm100: Provides the Nvidia Blackwell backend implementations for matmuls. -
sm80: Provides the CPU Hopper backend implementations for matmuls. -
sm90: Provides the Nvidia Hopper backend implementations for matmuls.
Modules
Functions
-
matmul_kernel: Matrix Multiplication using shared memory. This version loads blocks of size tile_size x tile_size from A and B and updates a tile_size x tile_size in C. The thread block should have shape (tile_size, tile_size, 1). Each thread is mapped one element in C. The grid should have shape (N/tile_size, M/tile_size, 1). N is the first dimension for coalesced access. -
matmul_kernel_naive: -
multistage_gemm: -
split_k_reduce:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!