Mojo struct
TMEMReader
@register_passable(trivial)
struct TMEMReader[accum_type: DType, data_paths: Int = 16, bits: Int = 256, repeat: Int = 4]
Load accumulator fragments from tensor memory (TMEM).
SM100 Blackwell GPUs have dedicated tensor memory for MMA accumulators. This struct encapsulates the tcgen05_ld operations.
Template Parameters: accum_type: Accumulator data type. data_paths: Number of datapaths (always 16 for SM100). bits: Bits per load (always 256 for SM100). repeat: Number of repetitions for wider loads.
Fields
- base_addr (
UInt32):
Implemented traits
AnyType,
Copyable,
ImplicitlyCopyable,
Movable,
UnknownDestructibility
comptime members
__copyinit__is_trivial
comptime __copyinit__is_trivial = True
__del__is_trivial
comptime __del__is_trivial = True
__moveinit__is_trivial
comptime __moveinit__is_trivial = True
frag_size
comptime frag_size = (((data_paths * (bits // 32)) // 32) * repeat)
lower_offset
comptime lower_offset = 1048576
Methods
__init__
__init__(base_addr: UInt32) -> Self
Initialize TMEM reader.
Args:
- base_addr (
UInt32): Base tensor memory address for the accumulator.
stage_addr
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!