Mojo struct

TMemTile

struct TMemTile[dtype_: DType, BM: Int, BN: Int]

Fields

tmem_addr (UInt32):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable

`comptime` members

`dtype`

comptime dtype = dtype_

`dtype_size`

comptime dtype_size = size_of[TMemTile[dtype_, BM, BN].dtype]()

`num_m_tiles`

comptime num_m_tiles = (BM // 64)

Methods

`init`

__init__(tmem_addr: UInt32) -> Self

`getitem`

__getitem__(self, i: UInt32) -> Self

`offset`

offset[m_mma: Int, n_mma: Int](self) -> UInt32

Returns:

UInt32

`allocate_register_tile`

static allocate_register_tile[*, num_threads: Int]() -> LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout]

Returns:

LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout]

`store_async`

store_async[*, num_threads: Int](self, src: LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout])

store_async[src_type: DType](self, src: TileTensor[src_type, Layout[*?, *?], MutExternalOrigin, address_space=AddressSpace.LOCAL])

store_async[src_type: DType, src_len: Int, src_offset: Int = 0](self, src: InlineArray[Scalar[src_type], src_len])

`load_async_with_st_matrix_layout`

load_async_with_st_matrix_layout[*, num_threads: Int](self) -> LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout]

Returns:

`load_st_matrix_chunk`

load_st_matrix_chunk[*, num_threads: Int, start_repeat: Int, num_repeats: Int](self, dst: LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout])

Load a range of repeat columns from tmem into a pre-allocated tensor.

Parameters:

num_threads (Int): Number of threads in the warp group.
start_repeat (Int): First repeat index to load (0-based).
num_repeats (Int): Number of repeats to load.

Args:

dst (LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout]): Pre-allocated register tensor.

`load_async`

load_async(self, out dst: InlineArray[Scalar[TMemTile[dtype_, BM, BN].dtype], BN])

Returns:

InlineArray[Scalar[TMemTile[dtype_, BM, BN].dtype], BN]

Fields​

Implemented traits​

comptime members​

dtype​

dtype_size​

num_m_tiles​

Methods​

__init__​

__getitem__​

offset​

allocate_register_tile​

store_async​

load_async_with_st_matrix_layout​

load_st_matrix_chunk​

load_async​

Fields

Implemented traits

`comptime` members

`dtype`

`dtype_size`

`num_m_tiles`

Methods

`init`

`getitem`

`offset`

`allocate_register_tile`

`store_async`

`load_async_with_st_matrix_layout`

`load_st_matrix_chunk`

`load_async`