Mojo function
cdna4_block_scaled_mfma
cdna4_block_scaled_mfma[a_scale_byte_index: Int32, b_scale_byte_index: Int32, a_matrix_format: CDNA4F8F6F4MatrixFormat, b_matrix_format: CDNA4F8F6F4MatrixFormat](mut d: SIMD[DType.float32, d.size], a: SIMD[DType.uint8, 32], b: SIMD[DType.uint8, 32], packed_scale_word_a: Int32, packed_scale_word_b: Int32)
Executes a CDNA4 f8f6f4 block-scaled MFMA, inferring the MMA shape from the accumulator width (16 lanes -> 32x32x64, 4 lanes -> 16x16x128).
a_scale_byte_index and b_scale_byte_index select byte 0..3 from the
packed 32-bit E8M0 packed_scale_word_a and packed_scale_word_b inputs.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!