Skip to main content
Log in

Mojo module

matmul_gpu

Functions

  • __nvvm_ldg_f4:
  • matmul_kernel: Matrix Multiplication using shared memory. This version loads blocks of size tile_size x tile_size from A and B and updates a tile_size x tile_size in C. The thread block should have shape (tile_size, tile_size, 1). Each thread is mapped one element in C. The grid should have shape (N/tile_size, M/tile_size, 1). N is the first dimension for coalesced access.
  • matmul_kernel_naive:
  • multistage_gemm:
  • split_k_reduce: