For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

ds_read_tr16_b64_warp

def ds_read_tr16_b64_warp[mma_shape: IndexList[Int(3)]](tile: TileTensor[Storage=tile.Storage, address_space=AddressSpace.SHARED, linear_idx_type=tile.linear_idx_type]) -> SIMD[tile.dtype, SIMDSize(4)]

Warp-level transposed LDS read distributing across 16-lane rows.

For 32x32x16 MMA: 2x2 row distribution over 8x32 tile. For 16x16x32 MMA: 4x1 row distribution over 16x16 tile.

Parameters:

mma_shape (IndexList[Int(3)]): MMA instruction shape (M, N, K).

Args:

tile (TileTensor[Storage=tile.Storage, address_space=AddressSpace.SHARED, linear_idx_type=tile.linear_idx_type]): A TileTensor in shared memory sized for the MMA shape.

Returns:

SIMD[tile.dtype, SIMDSize(4)]: A SIMD[dtype, 4] vector with transposed data for one lane.