IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

cdna4_block_scaled_mfma

def cdna4_block_scaled_mfma[a_scale_byte_index: Int32, b_scale_byte_index: Int32, a_matrix_format: CDNA4F8F6F4MatrixFormat, b_matrix_format: CDNA4F8F6F4MatrixFormat](mut d: SIMD[DType.float32], a: SIMD[DType.uint8], b: SIMD[DType.uint8], packed_scale_word_a: Int32, packed_scale_word_b: Int32)

Executes a CDNA4 f8f6f4 block-scaled MFMA, inferring the MMA shape from the accumulator width (16 lanes -> 32x32x64, 4 lanes -> 16x16x128).

a_scale_byte_index and b_scale_byte_index select byte 0..3 from the packed 32-bit E8M0 packed_scale_word_a and packed_scale_word_b inputs.