Mojo struct
TMemTile
struct TMemTile[dtype_: DType, BM: Int, BN: Int]
Fieldsβ
- βtmem_addr (
UInt32):
Implemented traitsβ
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
comptime membersβ
dtypeβ
comptime dtype = dtype_
dtype_sizeβ
comptime dtype_size = size_of[TMemTile[dtype_, BM, BN].dtype]()
num_m_tilesβ
comptime num_m_tiles = (BM // 64)
Methodsβ
__init__β
__init__(tmem_addr: UInt32) -> Self
__getitem__β
__getitem__(self, i: UInt32) -> Self
offsetβ
allocate_register_tileβ
static allocate_register_tile[*, num_threads: Int]() -> LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout]
Returns:
store_asyncβ
store_async[*, num_threads: Int](self, src: LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout])
store_async[src_type: DType](self, src: TileTensor[src_type, Layout[*?, *?], MutExternalOrigin, address_space=AddressSpace.LOCAL])
store_async[src_type: DType, src_len: Int, src_offset: Int = 0](self, src: InlineArray[Scalar[src_type], src_len])
load_async_with_st_matrix_layoutβ
load_async_with_st_matrix_layout[*, num_threads: Int](self) -> LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout]
Returns:
load_st_matrix_chunkβ
load_st_matrix_chunk[*, num_threads: Int, start_repeat: Int, num_repeats: Int](self, dst: LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout])
Load a range of repeat columns from tmem into a pre-allocated tensor.
Parameters:
- βnum_threads (
Int): Number of threads in the warp group. - βstart_repeat (
Int): First repeat index to load (0-based). - βnum_repeats (
Int): Number of repeats to load.
Args:
- βdst (
LayoutTensor[TMemTile[dtype_, BM, BN].dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=TMemTile[dtype_, BM, BN].dtype_size].element_layout]): Pre-allocated register tensor.
load_asyncβ
load_async(self, out dst: InlineArray[Scalar[TMemTile[dtype_, BM, BN].dtype], BN])
Returns:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!