For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo function
smem_mma_subtile
smem_mma_subtile[mma_rows: Int, mma_cols: Int, BN: Int, BK: Int, dtype: DType](smem_ptr: UnsafePointer[Scalar[dtype], MutAnyOrigin, address_space=AddressSpace.SHARED], bk_tile: Int, k_sub: Int, mma_idx: Int) -> TileTensor[dtype, Layout[*?, *?], MutAnyOrigin, address_space=AddressSpace.SHARED]
Creates a flat TileTensor for an MMA-sized sub-tile in blocked SMEM.
Used by the non-transposed (V buffer) load_from_shared path. The V
buffer's SMEM has shape (BN, depth) with blocked layout
(num_repeats x BN x BK blocks). Each MMA tile is mma_rows x mma_cols
within one block. The returned TileTensor uses plain
row_major[mma_rows, mma_cols] strides β only correct when the
physical row stride equals mma_cols. For mma_cols < BK, callers
must pair smem_mma_subtile_offset with an explicit-stride layout
(e.g. MixedLayout((mma_rows, mma_cols), (BK, 1))).
Parameters:
- βmma_rows (
Int): MMA tile height (e.g., MMA_K=16). - βmma_cols (
Int): MMA tile width (e.g., MMA_M=32). - βBN (
Int): Block height. - βBK (
Int): Block width. - βdtype (
DType): Element data type.
Args:
- βsmem_ptr (
UnsafePointer[Scalar[dtype], MutAnyOrigin, address_space=AddressSpace.SHARED]): Base pointer to the SMEM allocation for this buffer stage. - βbk_tile (
Int): Which BK-tall row group (0..depth/BK-1). - βk_sub (
Int): Which MMA_K sub-row within the BK group (0..BK/MMA_K-1). - βmma_idx (
Int): Linear MMA tile index across the full depth dimension.
Returns:
TileTensor[dtype, Layout[*?, *?], MutAnyOrigin, address_space=AddressSpace.SHARED]: A TileTensor view into the MMA-sized sub-tile.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!