Mojo function
mi355x_cost_model
mi355x_cost_model() -> TargetCostModel
MI355X cost model: production-tuned latencies.
Global loads (LOAD_A, LOAD_B): GLOBAL_MEM, 200 cycles, GLOBAL_LOAD Fragment loads (MMA_LOAD_A, MMA_LOAD_B): LDS, 20 cycles, FRAGMENT_LOAD MMA compute (COMPUTE, MMA): MMA_UNIT, 16 cycles, COMPUTE
Op tags are kernel-specific (defined in PingPongOps / DefaultMatmulOps): Ping-pong: 0=LOAD_A, 1=LOAD_B, 2=COMPUTE, 3=MMA_LOAD_A, 4=MMA_LOAD_B, 5=MMA Default: 0=LOAD_DRAM, 1=STORE_SMEM, 2=LOAD_FRAG, 3=COMPUTE
Returns:
TargetCostModel
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!