IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

TMemTile

struct TMemTile[dtype_: DType, BM: Int, BN: Int]

Fields​

  • ​tmem_addr (UInt32):

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDeletable, Movable, RegisterPassable, TrivialRegisterPassable

comptime members​

dtype​

comptime dtype = dtype_

dtype_size​

comptime dtype_size = size_of[TMemTile[dtype_, BM, BN].dtype]()

num_m_tiles​

comptime num_m_tiles = (BM // 64)

Methods​

__init__​

def __init__(tmem_addr: UInt32) -> Self

__getitem__​

def __getitem__(self, i: UInt32) -> Self

offset​

def offset[m_mma: Int, n_mma: Int](self) -> UInt32

Returns:

UInt32

allocate_register_tile​

static def allocate_register_tile[*, num_threads: Int]() -> LayoutTensor[Self.dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].element_layout]

Returns:

LayoutTensor[Self.dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].element_layout]

store_async​

def store_async[*, num_threads: Int](self, src: LayoutTensor[Self.dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].element_layout])

def store_async[src_type: DType](self, src: TileTensor[src_type, Layout[*?, *?], MutUntrackedOrigin, address_space=AddressSpace.LOCAL])

def store_async[src_type: DType, src_len: Int, src_offset: Int = 0](self, src: InlineArray[Scalar[src_type], src_len])

load_async_with_st_matrix_layout​

def load_async_with_st_matrix_layout[*, num_threads: Int](self) -> LayoutTensor[Self.dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].element_layout]

Returns:

LayoutTensor[Self.dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].element_layout]

load_st_matrix_chunk​

def load_st_matrix_chunk[*, num_threads: Int, start_repeat: Int, num_repeats: Int](self, dst: LayoutTensor[Self.dtype, STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].vec_local_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=STMatrixLayout[BM, BN, num_threads=num_threads, accum_dtype_size=Self.dtype_size].element_layout])

Load a range of repeat columns from tmem into a pre-allocated tensor.

Parameters:

  • ​num_threads (Int): Number of threads in the warp group.
  • ​start_repeat (Int): First repeat index to load (0-based).
  • ​num_repeats (Int): Number of repeats to load.

Args:

load_async​

def load_async(self, out dst: InlineArray[Scalar[Self.dtype], BN])

Returns:

InlineArray[Scalar[Self.dtype], BN]