Mojo struct

FragmentToSMemWriter

@register_passable(trivial) struct FragmentToSMemWriter[c_type: DType, c_tile_layout: Layout, //, tile_n_size: Int, num_m_mmas: Int, num_consumer: Int, half_tile: Bool, WG_BM: Int, WG_BN: Int, sub_wg_id: Int, swapAB: Bool = False]

Writes WGMMA accumulator results from registers to shared memory using st.matrix.

Stores 16-byte fragments with swizzling to avoid bank conflicts. Sub-warp groups divide N-dimension work, each handling a portion of WG_BN output tiles.

Parameters

c_type (DType): Output data type (must be bfloat16 for st.matrix).
c_tile_layout (Layout): Layout of the entire shared memory region.
tile_n_size (Int): Width of each output tile (typically TMA_BN).
num_m_mmas (Int): Number of MMA operations in M dimension.
num_consumer (Int): Number of consumer warp groups.
half_tile (Bool): Special mode for handling partial tiles.
WG_BM (Int): Warp group tile height.
WG_BN (Int): Warp group tile width.
sub_wg_id (Int): Which portion of WG_BN this instance handles.
swapAB (Bool): Whether to swap the A and B matrices.

Fields

c_tile (LayoutTensor[c_type, c_tile_layout, MutAnyOrigin, address_space=AddressSpace.SHARED, alignment=128]):
warp_group_thread_idx (UInt):
local_warp_group_idx (UInt):
st_matrix_rt_layout (FragmentToSMemWriter[tile_n_size, num_m_mmas, num_consumer, half_tile, WG_BM, WG_BN, sub_wg_id, swapAB].st_matrix_rt_layout_type):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegTileWriter, RegisterPassable, TrivialRegisterPassable

`comptime` members

`__copy_ctor_is_trivial`

comptime __copy_ctor_is_trivial = True

`delis_trivial`

comptime __del__is_trivial = True

`__move_ctor_is_trivial`

comptime __move_ctor_is_trivial = True

`st_matrix_layout`

comptime st_matrix_layout = Layout.row_major(WG_BM, tile_n_size) if swapAB.__invert__()._mlir_value else Layout.row_major(tile_n_size, WG_BN)

`st_matrix_layout_regular`

comptime st_matrix_layout_regular = st_matrix_n_layout[c_type, tile_n_size, num_m_mmas, num_consumer]()

`st_matrix_layout_transpose`

comptime st_matrix_layout_transpose = st_matrix_m_layout[c_type, tile_n_size, num_m_mmas, num_consumer]()

`st_matrix_rt_layout_type`

comptime st_matrix_rt_layout_type = RuntimeLayout[FragmentToSMemWriter[tile_n_size, num_m_mmas, num_consumer, half_tile, WG_BM, WG_BN, sub_wg_id, swapAB].st_matrix_layout_regular if swapAB.__invert__()._mlir_value else FragmentToSMemWriter[tile_n_size, num_m_mmas, num_consumer, half_tile, WG_BM, WG_BN, sub_wg_id, swapAB].st_matrix_layout_transpose, element_type=DType.int32, linear_idx_type=DType.int32]

`st_matrix_swizzle`

comptime st_matrix_swizzle = make_ldmatrix_swizzle[c_type, tile_n_size if swapAB.__invert__()._mlir_value else WG_BN, log2_floor((16 // size_of[c_type]()))]()

Methods

`init`

__init__(c_tile: LayoutTensor[c_type, c_tile_layout, MutAnyOrigin, address_space=AddressSpace.SHARED, alignment=128], warp_group_thread_idx: Scalar[DType.uint], local_warp_group_idx: Scalar[DType.uint]) -> Self

Initialize the fragment writer.

Args:

c_tile (LayoutTensor): Shared memory tile to write to.
warp_group_thread_idx (Scalar): Thread index within the warp group.
local_warp_group_idx (Scalar): Sub-warp group index (divides N-dimension work).

`write_tile`

write_tile(self, c_reg_tile: LayoutTensor[c_reg_tile._dtype, c_reg_tile.layout, MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=c_reg_tile.element_layout, layout_int_type=c_reg_tile.layout_int_type, linear_idx_type=c_reg_tile.linear_idx_type, masked=c_reg_tile.masked, alignment=c_reg_tile.alignment], coords: Tuple[UInt, UInt])

Write accumulator tile from registers to shared memory.

Args:

c_reg_tile (LayoutTensor): Register tile containing MMA results.
coords (Tuple): Tile position (row_idx, col_idx) in output.

Parameters​

Fields​

Implemented traits​

comptime members​

__copy_ctor_is_trivial​

__del__is_trivial​

__move_ctor_is_trivial​

st_matrix_layout​

st_matrix_layout_regular​

st_matrix_layout_transpose​

st_matrix_rt_layout_type​

st_matrix_swizzle​

Methods​

__init__​

write_tile​