IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

TmemAddress

struct TmemAddress

Simple TMEM address wrapper for load/store operations.

Encapsulates TMEM address encoding for accumulator fragment access. SM100 MMA accumulators are organized as 32 rows, split into:

  • Upper fragment (rows 0-15): accessed via upper_addr()
  • Lower fragment (rows 16-31): accessed via lower_addr()

The lower fragment address adds TMEM_LOWER_ROW_OFFSET (16 << 16) to encode the row offset in the upper 16 bits of the address.

Usage: var tmem = TmemAddress(base_offset)

# Load operations
var upper = tmem.load_upper[dtype, size]()
var lower = tmem.load_lower[dtype, size]()
TmemAddress.wait_load()

# Store operations
tmem.store_upper[dtype, size](upper_frag)
tmem.store_lower[dtype, size](lower_frag)
TmemAddress.wait_store()

# Low-level address access for custom operations
raw_upper = tmem.upper_addr()
raw_lower = tmem.lower_addr()

Fields​

  • ​addr (UInt32):

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDeletable, Movable, RegisterPassable, TrivialRegisterPassable

Methods​

__init__​

def __init__(addr: Int) -> Self

Create TmemAddress from integer column address.

def __init__(addr: UInt32) -> Self

Create TmemAddress from hardware address (UInt32).

__add__​

def __add__(self, offset: Int) -> Self

Create new TmemAddress with column offset added.

upper_addr​

def upper_addr(self) -> UInt32

Raw address for upper fragment (rows 0-15).

Returns:

UInt32

lower_addr​

def lower_addr(self) -> UInt32

Raw address for lower fragment (rows 16-31).

Returns:

UInt32

load_upper​

def load_upper[dtype: DType, width: Int, data_paths: Int = Int(16), bits: Int = Int(256), repeat: Int = Int(1)](self) -> InlineArray[Scalar[dtype], width]

Load upper accumulator fragment (rows 0-15).

Returns:

InlineArray[Scalar[dtype], width]

load_lower​

def load_lower[dtype: DType, width: Int, data_paths: Int = Int(16), bits: Int = Int(256), repeat: Int = Int(1)](self) -> InlineArray[Scalar[dtype], width]

Load lower accumulator fragment (rows 16-31).

Returns:

InlineArray[Scalar[dtype], width]

store_upper​

def store_upper[dtype: DType, width: Int, data_paths: Int = Int(16), bits: Int = Int(256), repeat: Int = Int(1)](self, data: InlineArray[Scalar[dtype], width])

Store upper accumulator fragment (rows 0-15).

store_lower​

def store_lower[dtype: DType, width: Int, data_paths: Int = Int(16), bits: Int = Int(256), repeat: Int = Int(1)](self, data: InlineArray[Scalar[dtype], width])

Store lower accumulator fragment (rows 16-31).

wait_store​

static def wait_store()

Wait for TMEM store operations to complete.

wait_load​

static def wait_load()

Wait for TMEM load operations to complete.