For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
TmemAddress
struct TmemAddress
Simple TMEM address wrapper for load/store operations.
Encapsulates TMEM address encoding for accumulator fragment access. SM100 MMA accumulators are organized as 32 rows, split into:
- Upper fragment (rows 0-15): accessed via upper_addr()
- Lower fragment (rows 16-31): accessed via lower_addr()
The lower fragment address adds TMEM_LOWER_ROW_OFFSET (16 << 16) to encode the row offset in the upper 16 bits of the address.
Usage: var tmem = TmemAddress(base_offset)
# Load operations
var upper = tmem.load_upper[dtype, size]()
var lower = tmem.load_lower[dtype, size]()
TmemAddress.wait_load()
# Store operations
tmem.store_upper[dtype, size](upper_frag)
tmem.store_lower[dtype, size](lower_frag)
TmemAddress.wait_store()
# Low-level address access for custom operations
raw_upper = tmem.upper_addr()
raw_lower = tmem.lower_addr()Fieldsβ
- βaddr (
UInt32):
Implemented traitsβ
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDeletable,
Movable,
RegisterPassable,
TrivialRegisterPassable
Methodsβ
__init__β
def __init__(addr: Int) -> Self
Create TmemAddress from integer column address.
def __init__(addr: UInt32) -> Self
Create TmemAddress from hardware address (UInt32).
__add__β
def __add__(self, offset: Int) -> Self
Create new TmemAddress with column offset added.
upper_addrβ
lower_addrβ
load_upperβ
def load_upper[dtype: DType, width: Int, data_paths: Int = Int(16), bits: Int = Int(256), repeat: Int = Int(1)](self) -> InlineArray[Scalar[dtype], width]
Load upper accumulator fragment (rows 0-15).
Returns:
load_lowerβ
def load_lower[dtype: DType, width: Int, data_paths: Int = Int(16), bits: Int = Int(256), repeat: Int = Int(1)](self) -> InlineArray[Scalar[dtype], width]
Load lower accumulator fragment (rows 16-31).
Returns:
store_upperβ
def store_upper[dtype: DType, width: Int, data_paths: Int = Int(16), bits: Int = Int(256), repeat: Int = Int(1)](self, data: InlineArray[Scalar[dtype], width])
Store upper accumulator fragment (rows 0-15).
store_lowerβ
def store_lower[dtype: DType, width: Int, data_paths: Int = Int(16), bits: Int = Int(256), repeat: Int = Int(1)](self, data: InlineArray[Scalar[dtype], width])
Store lower accumulator fragment (rows 16-31).
wait_storeβ
static def wait_store()
Wait for TMEM store operations to complete.
wait_loadβ
static def wait_load()
Wait for TMEM load operations to complete.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!