Mojo module
tile_utils
TileTensor utilities for block-aligned SMEM tiling.
The SMEM layout for AMD MHA uses blocked_product(row_major(BN, BK), row_major(1, num_repeats)), which stores num_repeats contiguous BN×BK blocks. TileTensor's tile[] assumes flat strides and cannot tile hierarchical layouts directly.
These helpers compute offsets from the known block structure and create flat TileTensor sub-views. They are correct when tile dimensions align with block boundaries (always true for the MHA kernel's tile sizes).
Gap: TileTensor's _tile uses stride[i]().value() which fails on
Coord-valued strides from blocked_product. A proper fix requires
TileTensor to use zipped_divide for hierarchical layouts, which in turn
needs _Divide/_Multiply to recursively handle Coords (partially done
in coord.mojo — the type algebra extension compiles but zipped_divide
semantics need further work for floor-division across hierarchy levels).
Functions
-
smem_mma_subtile: Creates a flat TileTensor for an MMA-sized sub-tile in blocked SMEM. -
smem_subtile: Creates a flat TileTensor sub-view of a blocked SMEM layout.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!