Mojo function
chiplet_transform_chunked
chiplet_transform_chunked[num_xcds: Int, chunk_size: Int](workgroup_id: Int, num_workgroups: Int) -> Int
Transform work group ID for better chiplet locality.
AMD MI300X/MI355X have 8 XCDs (chiplets), each with its own L2 cache. This function reorganizes blocks from round-robin distribution to chunked allocation, improving cache locality.
Original pattern: WG0→XCD0, WG1→XCD1, ..., WG8→XCD0 Transformed: WG0-63→XCD0, WG64-127→XCD1, etc.
Parameters:
- num_xcds (
Int): Number of XCDs (8 for MI300X/MI355X). - chunk_size (
Int): Number of blocks per XCD chunk.
Args:
Returns:
Int: Transformed block ID for better XCD locality.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!