Skip to main content

Mojo struct

TMEMReader

@register_passable(trivial) struct TMEMReader[accum_type: DType, data_paths: Int = 16, bits: Int = 256, repeat: Int = 4]

Load accumulator fragments from tensor memory (TMEM).

SM100 Blackwell GPUs have dedicated tensor memory for MMA accumulators. This struct encapsulates the tcgen05_ld operations.

Template Parameters: accum_type: Accumulator data type. data_paths: Number of datapaths (always 16 for SM100). bits: Bits per load (always 256 for SM100). repeat: Number of repetitions for wider loads.

Fields

  • base_addr (UInt32):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, Movable, UnknownDestructibility

comptime members

__copyinit__is_trivial

comptime __copyinit__is_trivial = True

__del__is_trivial

comptime __del__is_trivial = True

__moveinit__is_trivial

comptime __moveinit__is_trivial = True

frag_size

comptime frag_size = (((data_paths * (bits // 32)) // 32) * repeat)

lower_offset

comptime lower_offset = 1048576

Methods

__init__

__init__(base_addr: UInt32) -> Self

Initialize TMEM reader.

Args:

  • base_addr (UInt32): Base tensor memory address for the accumulator.

stage_addr

stage_addr(self, stage: Int, stageN: Int) -> UInt32

Compute TMEM address for a given stage.

Args:

  • stage (Int): Stage index.
  • stageN (Int): Stage width in elements.

Returns:

UInt32: TMEM address for the stage.

Was this page helpful?