Skip to main content

Mojo function

mi355x_cost_model

mi355x_cost_model() -> TargetCostModel

MI355X cost model: production-tuned latencies.

Global loads (LOAD_A, LOAD_B): GLOBAL_MEM, 200 cycles, GLOBAL_LOAD Fragment loads (MMA_LOAD_A, MMA_LOAD_B): LDS, 20 cycles, FRAGMENT_LOAD MMA compute (COMPUTE, MMA): MMA_UNIT, 16 cycles, COMPUTE

Op tags are kernel-specific (defined in PingPongOps / DefaultMatmulOps): Ping-pong: 0=LOAD_A, 1=LOAD_B, 2=COMPUTE, 3=MMA_LOAD_A, 4=MMA_LOAD_B, 5=MMA Default: 0=LOAD_DRAM, 1=STORE_SMEM, 2=LOAD_FRAG, 3=COMPUTE

Returns:

TargetCostModel

Was this page helpful?