Skip to main content
Log in

Mojo function

store_matrix_d

store_matrix_d[dtype: DType, //, m: Int, n: Int, k: Int](d_ptr: UnsafePointer[SIMD[dtype, 1]], d: SIMD[dtype, 4], tile_row: Int, tile_col: Int, ldm: Int)

Stores matrix D tile from registers to memory after tensor core operation.

This function dispatches to architecture-specific implementations for storing the results of a tensor core matrix multiply-accumulate operation. It handles the different memory layouts required by NVIDIA and AMD tensor cores.

Note: - Automatically selects appropriate implementation based on GPU architecture. - Each thread stores 4 elements in architecture-specific positions. - Must be called by all threads in a warp.

Parameters:

  • dtype (DType): Data type of the matrix elements.
  • m (Int): Number of rows in matrix D.
  • n (Int): Number of columns in matrix D.
  • k (Int): Inner dimension for matrix multiply.

Args:

  • d_ptr (UnsafePointer[SIMD[dtype, 1]]): Pointer to destination memory for matrix D.
  • d (SIMD[dtype, 4]): SIMD vector containing 4 elements to store.
  • tile_row (Int): Starting row index of the tile in matrix D.
  • tile_col (Int): Starting column index of the tile in matrix D.
  • ldm (Int): Leading dimension (stride) of matrix D.