Mojo struct
OutputRegisterBufferRDNA
struct OutputRegisterBufferRDNA[dtype: DType, num_m_mmas: Int, num_n_mmas: Int]
RDNA-specific output register buffer for Wave32 WMMA.
Fields
- reg_tile (
OutputRegisterBufferRDNA[dtype, num_m_mmas, num_n_mmas].RegisterTileType):
Implemented traits
AnyType,
ImplicitlyDestructible,
RegisterBuffer
comptime members
__del__is_trivial
comptime __del__is_trivial = True
output_frag_size
comptime output_frag_size = RDNA_CD_FRAG_SIZE
reg_dtype
comptime reg_dtype = dtype
reg_tile_layout
comptime reg_tile_layout = Layout.row_major((num_n_mmas * num_m_mmas), 8)
RegisterTileType
comptime RegisterTileType = LayoutTensor[dtype, OutputRegisterBufferRDNA[dtype, num_m_mmas, num_n_mmas].reg_tile_layout, MutAnyOrigin, address_space=AddressSpace.LOCAL]
Methods
__init__
__init__(out self)
get_dtype
vectorize
vectorize(self) -> LayoutTensor[dtype, coalesce(LayoutTensor._compute_tile_layout[1, 8]()[1], True), MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=LayoutTensor._divide_tiles[1, 8]()[0], layout_int_type=_get_layout_type(OutputRegisterBufferRDNA[dtype, num_m_mmas, num_n_mmas].reg_tile_layout, AddressSpace.LOCAL), linear_idx_type=_get_index_type(OutputRegisterBufferRDNA[dtype, num_m_mmas, num_n_mmas].reg_tile_layout, AddressSpace.LOCAL)]
Returns:
apply_softmax_denominator
apply_softmax_denominator(self, rowsum: LayoutTensor[dtype, rowsum.layout, rowsum.origin, address_space=rowsum.address_space, element_layout=rowsum.element_layout, layout_int_type=rowsum.layout_int_type, linear_idx_type=rowsum.linear_idx_type, masked=rowsum.masked, alignment=rowsum.alignment])
Apply softmax denominator normalization to output accumulator.
zero
zero(self)
get_reg_tile
get_reg_tile[stage: Int = 0](self) -> OutputRegisterBufferRDNA[dtype, num_m_mmas, num_n_mmas].RegisterTileType
Returns:
OutputRegisterBufferRDNA
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!