For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

smem_mma_subtile

def smem_mma_subtile[mma_rows: Int, mma_cols: Int, BN: Int, BK: Int, dtype: DType](smem_ptr: UnsafePointer[Scalar[dtype], MutAnyOrigin, address_space=AddressSpace.SHARED], bk_tile: Int, k_sub: Int, mma_idx: Int) -> TileTensor[dtype, Layout[*?, *?], MutAnyOrigin, address_space=AddressSpace.SHARED]

Creates a flat TileTensor for an MMA-sized sub-tile in blocked SMEM.

Used by the non-transposed (V buffer) load_from_shared path. The V buffer's SMEM has shape (BN, depth) with blocked layout (num_repeats x BN x BK blocks). Each MMA tile is mma_rows x mma_cols within one block. The returned TileTensor uses plain row_major[mma_rows, mma_cols] strides — only correct when the physical row stride equals mma_cols. For mma_cols < BK, callers must pair smem_mma_subtile_offset with an explicit-stride layout (e.g. MixedLayout((mma_rows, mma_cols), (BK, 1))).

Parameters:

mma_rows (Int): MMA tile height (e.g., MMA_K=16).
mma_cols (Int): MMA tile width (e.g., MMA_M=32).
BN (Int): Block height.
BK (Int): Block width.
dtype (DType): Element data type.

Args:

smem_ptr (UnsafePointer[Scalar[dtype], MutAnyOrigin, address_space=AddressSpace.SHARED]): Base pointer to the SMEM allocation for this buffer stage.
bk_tile (Int): Which BK-tall row group (0..depth/BK-1).
k_sub (Int): Which MMA_K sub-row within the BK group (0..BK/MMA_K-1).
mma_idx (Int): Linear MMA tile index across the full depth dimension.

Returns:

TileTensor[dtype, Layout[*?, *?], MutAnyOrigin, address_space=AddressSpace.SHARED]: A TileTensor view into the MMA-sized sub-tile.