Skip to main content

Mojo function

chiplet_transform_chunked

chiplet_transform_chunked[num_xcds: Int, chunk_size: Int](workgroup_id: Int, num_workgroups: Int) -> Int

Transform work group ID for better chiplet locality.

AMD MI300X/MI355X have 8 XCDs (chiplets), each with its own L2 cache. This function reorganizes blocks from round-robin distribution to chunked allocation, improving cache locality.

Original pattern: WG0→XCD0, WG1→XCD1, ..., WG8→XCD0 Transformed: WG0-63→XCD0, WG64-127→XCD1, etc.

Parameters:

  • num_xcds (Int): Number of XCDs (8 for MI300X/MI355X).
  • chunk_size (Int): Number of blocks per XCD chunk.

Args:

  • workgroup_id (Int): Original block ID.
  • num_workgroups (Int): Total number of blocks.

Returns:

Int: Transformed block ID for better XCD locality.

Was this page helpful?