Mojo struct
TmemAddress
@register_passable(trivial)
struct TmemAddress
Simple TMEM address wrapper for load/store operations.
Encapsulates TMEM address encoding for accumulator fragment access. SM100 MMA accumulators are organized as 32 rows, split into:
- Upper fragment (rows 0-15): accessed via upper_addr()
- Lower fragment (rows 16-31): accessed via lower_addr()
The lower fragment address adds TMEM_LOWER_ROW_OFFSET (16 << 16) to encode the row offset in the upper 16 bits of the address.
Usage: var tmem = TmemAddress(base_offset)
# Load operations
var upper = tmem.load_upper[dtype, size]()
var lower = tmem.load_lower[dtype, size]()
TmemAddress.wait_load()
# Store operations
tmem.store_upper[dtype, size](upper_frag)
tmem.store_lower[dtype, size](lower_frag)
TmemAddress.wait_store()
# Low-level address access for custom operations
raw_upper = tmem.upper_addr()
raw_lower = tmem.lower_addr()Fields
- addr (
UInt32):
Implemented traits
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable
comptime members
__copyinit__is_trivial
comptime __copyinit__is_trivial = True
__del__is_trivial
comptime __del__is_trivial = True
__moveinit__is_trivial
comptime __moveinit__is_trivial = True
Methods
__init__
__init__(addr: UInt32) -> Self
__add__
__add__(self, offset: UInt32) -> Self
Create new TmemAddress with column offset added.
__add__(self, offset: Int) -> Self
Create new TmemAddress with column offset added.
upper_addr
lower_addr
load_upper
load_upper[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self) -> SIMD[dtype, width]
Load upper accumulator fragment (rows 0-15).
Returns:
load_lower
load_lower[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self) -> SIMD[dtype, width]
Load lower accumulator fragment (rows 16-31).
Returns:
store_upper
store_upper[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self, data: SIMD[dtype, width])
Store upper accumulator fragment (rows 0-15).
store_lower
store_lower[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self, data: SIMD[dtype, width])
Store lower accumulator fragment (rows 16-31).
wait_store
static wait_store()
Wait for TMEM store operations to complete.
wait_load
static wait_load()
Wait for TMEM load operations to complete.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!