Skip to main content

Mojo function

gemm_kernel_rdna

gemm_kernel_rdna[c_type: DType, a_type: DType, b_type: DType, c_layout: Layout, a_layout: Layout, b_layout: Layout, transpose_b: Bool = True, elementwise_lambda_fn: Optional[elementwise_epilogue_type] = None, s_type: DType = get_accum_type[c_type]()](c: LayoutTensor[c_type, c_layout, MutAnyOrigin], a: LayoutTensor[a_type, a_layout, ImmutAnyOrigin], b: LayoutTensor[b_type, b_layout, ImmutAnyOrigin], m: Int, n: Int, k: Int)

GEMM kernel for AMD RDNA GPUs.

On RDNA 3+ (gfx11xx/gfx12xx), uses 16x16x16 WMMA instructions with shared memory tiling. On older RDNA (gfx10xx), falls back to a per-thread naive matmul that iterates over the K dimension with scalar accumulation.

Was this page helpful?