Mojo struct
TmemAddress
struct TmemAddress
Simple TMEM address wrapper for load/store operations.
Encapsulates TMEM address encoding for accumulator fragment access. SM100 MMA accumulators are organized as 32 rows, split into:
- Upper fragment (rows 0-15): accessed via upper_addr()
- Lower fragment (rows 16-31): accessed via lower_addr()
The lower fragment address adds TMEM_LOWER_ROW_OFFSET (16 << 16) to encode the row offset in the upper 16 bits of the address.
Usage: var tmem = TmemAddress(base_offset)
# Load operations
var upper = tmem.load_upper[dtype, size]()
var lower = tmem.load_lower[dtype, size]()
TmemAddress.wait_load()
# Store operations
tmem.store_upper[dtype, size](upper_frag)
tmem.store_lower[dtype, size](lower_frag)
TmemAddress.wait_store()
# Low-level address access for custom operations
raw_upper = tmem.upper_addr()
raw_lower = tmem.lower_addr()Fieldsβ
- βaddr (
UInt32):
Implemented traitsβ
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
Methodsβ
__init__β
__init__(addr: Int) -> Self
Create TmemAddress from integer column address.
__init__(addr: UInt32) -> Self
Create TmemAddress from hardware address (UInt32).
__add__β
__add__(self, offset: Int) -> Self
Create new TmemAddress with column offset added.
upper_addrβ
lower_addrβ
load_upperβ
load_upper[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self) -> InlineArray[Scalar[dtype], width]
Load upper accumulator fragment (rows 0-15).
Returns:
load_lowerβ
load_lower[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self) -> InlineArray[Scalar[dtype], width]
Load lower accumulator fragment (rows 16-31).
Returns:
store_upperβ
store_upper[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self, data: InlineArray[Scalar[dtype], width])
Store upper accumulator fragment (rows 0-15).
store_lowerβ
store_lower[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self, data: InlineArray[Scalar[dtype], width])
Store lower accumulator fragment (rows 16-31).
wait_storeβ
static wait_store()
Wait for TMEM store operations to complete.
wait_loadβ
static wait_load()
Wait for TMEM load operations to complete.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!