Mojo module

tile_utils

TileTensor utilities for block-aligned SMEM tiling.

The SMEM layout for AMD MHA uses blocked_product(row_major(BN, BK), row_major(1, num_repeats)), which stores num_repeats contiguous BN×BK blocks. TileTensor's tile[] assumes flat strides and cannot tile hierarchical layouts directly.

These helpers compute offsets from the known block structure and create flat TileTensor sub-views. They are correct when tile dimensions align with block boundaries (always true for the MHA kernel's tile sizes).

Gap: TileTensor's _tile uses stride[i]().value() which fails on Coord-valued strides from blocked_product. A proper fix requires TileTensor to use zipped_divide for hierarchical layouts, which in turn needs _Divide/_Multiply to recursively handle Coords (partially done in coord.mojo — the type algebra extension compiles but zipped_divide semantics need further work for floor-division across hierarchy levels).

Functions

smem_mma_subtile: Creates a flat TileTensor for an MMA-sized sub-tile in blocked SMEM.
smem_subtile: Creates a flat TileTensor sub-view of a blocked SMEM layout.

Functions​

Functions