Mojo function
mi355x_single_buffer_cost_model
mi355x_single_buffer_cost_model() -> TargetCostModel
MI355X cost model for single-buffer matmul (DefaultMatmulOps tags).
Tag mapping (from DefaultMatmulOps): 0=LOAD_DRAM: GLOBAL_MEM, 200 cycles, GLOBAL_LOAD 1=STORE_SMEM: LDS, 20 cycles, SHARED_STORE 2=LOAD_FRAG: LDS, 20 cycles, FRAGMENT_LOAD 3=COMPUTE: MMA_UNIT, 64 cycles, COMPUTE
VGPR liveness hints (BF16 32x32x16 MFMA on MI355X): LOAD_DRAM: 0 (buffer_load_lds writes directly to LDS) STORE_SMEM: 0 (ds_write consumes existing register data) LOAD_FRAG: vgpr_def=4 (one 32x32 frag is 4 VGPRs/lane) COMPUTE: vgpr_def=16 (32x32 fp32 accumulator = 16 VGPRs/lane)
Returns:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!