Skip to main content

Mojo struct

TMemTile

struct TMemTile[dtype_: DType, BM: Int, BN: Int]

Fields​

  • ​tmem_addr (UInt32):

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable

comptime members​

dtype​

comptime dtype = dtype_

dtype_size​

comptime dtype_size = size_of[TMemTile[dtype_, BM, BN].dtype]()

num_m_tiles​

comptime num_m_tiles = (BM // 64)

Methods​

__init__​

__init__(tmem_addr: UInt32) -> Self

__getitem__​

__getitem__(self, i: UInt32) -> Self

offset​

offset[m_mma: Int, n_mma: Int](self) -> UInt32

Returns:

UInt32

allocate_register_tile​

static allocate_register_tile[*, num_threads: Int]() -> LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout]

Returns:

LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout]

store_async​

store_async[*, num_threads: Int](self, src: LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout])

store_async[src_type: DType](self, src: TileTensor[src_type, Layout[*?, *?], MutExternalOrigin, address_space=AddressSpace.LOCAL])

store_async[src_type: DType, src_len: Int, src_offset: Int = 0](self, src: InlineArray[Scalar[src_type], src_len])

load_async_with_st_matrix_layout​

load_async_with_st_matrix_layout[*, num_threads: Int](self) -> LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout]

Returns:

LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout]

load_st_matrix_chunk​

load_st_matrix_chunk[*, num_threads: Int, start_repeat: Int, num_repeats: Int](self, dst: LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout])

Load a range of repeat columns from tmem into a pre-allocated tensor.

Parameters:

  • ​num_threads (Int): Number of threads in the warp group.
  • ​start_repeat (Int): First repeat index to load (0-based).
  • ​num_repeats (Int): Number of repeats to load.

Args:

load_async​

load_async(self, out dst: InlineArray[Scalar[TMemTile[dtype_, BM, BN].dtype], BN])

Returns:

InlineArray[Scalar[TMemTile[dtype_, BM, BN].dtype], BN]