Skip to main content

Mojo struct

TmemAddress

@register_passable(trivial) struct TmemAddress

Simple TMEM address wrapper for load/store operations.

Encapsulates TMEM address encoding for accumulator fragment access. SM100 MMA accumulators are organized as 32 rows, split into:

  • Upper fragment (rows 0-15): accessed via upper_addr()
  • Lower fragment (rows 16-31): accessed via lower_addr()

The lower fragment address adds TMEM_LOWER_ROW_OFFSET (16 << 16) to encode the row offset in the upper 16 bits of the address.

Usage: var tmem = TmemAddress(base_offset)

# Load operations
var upper = tmem.load_upper[dtype, size]()
var lower = tmem.load_lower[dtype, size]()
TmemAddress.wait_load()

# Store operations
tmem.store_upper[dtype, size](upper_frag)
tmem.store_lower[dtype, size](lower_frag)
TmemAddress.wait_store()

# Low-level address access for custom operations
raw_upper = tmem.upper_addr()
raw_lower = tmem.lower_addr()

Fields

  • addr (UInt32):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable

comptime members

__copyinit__is_trivial

comptime __copyinit__is_trivial = True

__del__is_trivial

comptime __del__is_trivial = True

__moveinit__is_trivial

comptime __moveinit__is_trivial = True

Methods

__init__

__init__(addr: UInt32) -> Self

__add__

__add__(self, offset: UInt32) -> Self

Create new TmemAddress with column offset added.

__add__(self, offset: Int) -> Self

Create new TmemAddress with column offset added.

upper_addr

upper_addr(self) -> UInt32

Raw address for upper fragment (rows 0-15).

Returns:

UInt32

lower_addr

lower_addr(self) -> UInt32

Raw address for lower fragment (rows 16-31).

Returns:

UInt32

load_upper

load_upper[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self) -> SIMD[dtype, width]

Load upper accumulator fragment (rows 0-15).

Returns:

SIMD

load_lower

load_lower[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self) -> SIMD[dtype, width]

Load lower accumulator fragment (rows 16-31).

Returns:

SIMD

store_upper

store_upper[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self, data: SIMD[dtype, width])

Store upper accumulator fragment (rows 0-15).

store_lower

store_lower[dtype: DType, width: Int, data_paths: Int = 16, bits: Int = 256, repeat: Int = 1](self, data: SIMD[dtype, width])

Store lower accumulator fragment (rows 16-31).

wait_store

static wait_store()

Wait for TMEM store operations to complete.

wait_load

static wait_load()

Wait for TMEM load operations to complete.

Was this page helpful?