Mojo module
hk_mha_mma_op
MHA MMA operator: shape constants, SMEM→register loaders, and MFMA dispatch used by HKMhaPrefill.
Specialized for v_mfma_f32_32x32x16_bf16 (BF16-input, FP32-accum)
on gfx950. K side uses the two-XOR worker-index swizzle and
ds_read_b128; V side uses the identity swizzle and the
ds_read_tr16_b64_warp transpose-load.
comptime values
ACC_ROW_OFFSETS_32x32
comptime ACC_ROW_OFFSETS_32x32 = SIMD(Int32(0), Int32(1), Int32(2), Int32(3), Int32(8), Int32(9), Int32(10), Int32(11), Int32(16), Int32(17), Int32(18), Int32(19), Int32(24), Int32(25), Int32(26), Int32(27), __list_literal__=NoneType(None))
Structs
-
HKMhaConfig: Shape configuration forHKMhaPrefill. -
MhaMmaOp: Namespace-style struct holding the shape constants, register-tile layouts, and SMEM→register loaders forHKMhaPrefill. All call sites go through static methods on this struct.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!