Skip to main content

Mojo module

hw_ops

TileTensor-native AMD GPU hardware operations for MHA.

Ports of the LayoutTensor-based HW load functions from amd/utils.mojo to TileTensor. These use new-style layouts (from tile_layout.mojo) for thread distribution and operate on TileTensor SMEM/register tiles.

Functions: ds_read_tr16_b64_row — 4×16 transposed LDS read (raw rocdl intrinsic) ds_read_tr16_b64_warp — warp-level transposed LDS read tt_load_b_tr — transposed B operand load (split into halves) tt_load_b_tile — single MMA tile load from SMEM with swizzle tt_load_b — full B operand load from SMEM warp tile tt_copy_dram_to_sram_lds — fully TileTensor DMA (both dst and src)

Functions

Was this page helpful?