Mojo function
store_matrix_d
store_matrix_d[dtype: DType, //, m: Int, n: Int, k: Int](d_ptr: UnsafePointer[SIMD[dtype, 1]], d: SIMD[dtype, 4], tile_row: Int, tile_col: Int, ldm: Int)
Stores matrix D tile from registers to memory after tensor core operation.
This function dispatches to architecture-specific implementations for storing the results of a tensor core matrix multiply-accumulate operation. It handles the different memory layouts required by NVIDIA and AMD tensor cores.
Note: - Automatically selects appropriate implementation based on GPU architecture. - Each thread stores 4 elements in architecture-specific positions. - Must be called by all threads in a warp.
Parameters:
- dtype (
DType
): Data type of the matrix elements. - m (
Int
): Number of rows in matrix D. - n (
Int
): Number of columns in matrix D. - k (
Int
): Inner dimension for matrix multiply.
Args:
- d_ptr (
UnsafePointer[SIMD[dtype, 1]]
): Pointer to destination memory for matrix D. - d (
SIMD[dtype, 4]
): SIMD vector containing 4 elements to store. - tile_row (
Int
): Starting row index of the tile in matrix D. - tile_col (
Int
): Starting column index of the tile in matrix D. - ldm (
Int
): Leading dimension (stride) of matrix D.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!