For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

RegisterToGMemWriter

struct RegisterToGMemWriter[c_type: DType, dst_layout: TensorLayout, dst_origin: MutOrigin, dst_linear_idx_type: DType, dst_element_size: Int, //, wgmma_shape: IndexList[3], num_consumer: Int, N: Int, epilogue_fn: Optional[def[dtype: DType, width: Int, *, alignment: Int = 1](IndexList[2], SIMD[dtype, width]) capturing -> None] = None, compute_lambda_fn: Optional[def[dtype: DType, width: Int, *, alignment: Int = 1](IndexList[2], SIMD[dtype, width]) capturing -> SIMD[dtype, width]] = None, check_runtime_bounds: Bool = False, swapAB: Bool = False]

Writer for transferring accumulator registers directly to global memory.

This writer handles the direct copy from register tiles to global memory tiles, with proper thread distribution and alignment. It supports optional epilogue processing, compute lambda transformations, and bounds checking.

Parameters

c_type (DType): Output data type.
dst_layout (TensorLayout): Layout of the destination tensor.
dst_origin (MutOrigin): Origin type of the destination tensor.
dst_linear_idx_type (DType): Linear index type for destination tensor.
dst_element_size (Int): Number of scalar elements per destination element.
wgmma_shape (IndexList[3]): Shape of the WGMMA operation [M, N, K].
num_consumer (Int): Number of consumer warp groups.
N (Int): Matrix N dimension.
epilogue_fn (Optional[def[dtype: DType, width: Int, *, alignment: Int = 1](IndexList[2], SIMD[dtype, width]) capturing -> None]): Optional epilogue function (mutates value in place).
compute_lambda_fn (Optional[def[dtype: DType, width: Int, *, alignment: Int = 1](IndexList[2], SIMD[dtype, width]) capturing -> SIMD[dtype, width]]): Optional compute lambda function (returns new value).
check_runtime_bounds (Bool): Whether to perform bounds checking on N dimension.
swapAB (Bool): Whether to swap the A and B matrices.

Fields

thread_info (ThreadInfo):
dst (RegisterToGMemWriter[wgmma_shape, num_consumer, N, epilogue_fn, compute_lambda_fn, check_runtime_bounds, swapAB].DstType):
num_m_mmas (Int):
tile_coords (OptionalReg[TileCoordinates]):
max_row (OptionalReg[UInt32]):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDeletable, Movable, RegTileWriter, RegisterPassable, TrivialRegisterPassable

`comptime` members

`c_frag_size`

comptime c_frag_size = ((wgmma_shape[0] * wgmma_shape[1]) // WARPGROUP_SIZE)

`DstType`

comptime DstType = TileTensor[c_type, dst_layout, dst_origin, linear_idx_type=dst_linear_idx_type, element_size=dst_element_size]

`num_frag_mats`

comptime num_frag_mats = (RegisterToGMemWriter[wgmma_shape, num_consumer, N, epilogue_fn, compute_lambda_fn, check_runtime_bounds, swapAB].num_n_frag_mat * RegisterToGMemWriter[wgmma_shape, num_consumer, N, epilogue_fn, compute_lambda_fn, check_runtime_bounds, swapAB].num_m_frag_mat)

`num_m_frag_mat`

comptime num_m_frag_mat = ((wgmma_shape[0] // 4) // 8)

`num_n_frag_mat`

comptime num_n_frag_mat = (wgmma_shape[1] // 8)

Methods

`init`

def __init__(dst: TileTensor[c_type, dst_layout, dst_origin, linear_idx_type=dst_linear_idx_type, element_size=dst_element_size], warp_group_thread_idx: Int, num_m_mmas: Int, tile_coords: OptionalReg[TileCoordinates] = None, max_row: OptionalReg[UInt32] = None) -> Self

Initialize the register-to-global-memory writer.

Args:

dst (TileTensor[c_type, dst_layout, dst_origin, linear_idx_type=dst_linear_idx_type, element_size=dst_element_size]): Destination tensor in global memory.
warp_group_thread_idx (Int): Thread index within the warp group.
num_m_mmas (Int): Number of MMA tiles in M dimension.
tile_coords (OptionalReg[TileCoordinates]): Optional tile coordinates for epilogue processing.
max_row (OptionalReg[UInt32]): Optional maximum valid M coordinate (for epilogue).

`write_tile`

def write_tile(self, c_reg_tile: LayoutTensor[MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=c_reg_tile.element_layout, layout_int_type=c_reg_tile.layout_int_type, linear_idx_type=c_reg_tile.linear_idx_type, masked=c_reg_tile.masked, alignment=c_reg_tile.alignment], coords: Tuple[Int, Int])

Write a single MMA tile from registers to global memory.

Args:

c_reg_tile (LayoutTensor[MutAnyOrigin, address_space=AddressSpace.LOCAL, element_layout=c_reg_tile.element_layout, layout_int_type=c_reg_tile.layout_int_type, linear_idx_type=c_reg_tile.linear_idx_type, masked=c_reg_tile.masked, alignment=c_reg_tile.alignment]): Register tile containing accumulator values.
coords (Tuple[Int, Int]): Tile coordinates (row, column) in the destination matrix.

Parameters​

Fields​

Implemented traits​

comptime members​

c_frag_size​

DstType​

num_frag_mats​

num_m_frag_mat​

num_n_frag_mat​

Methods​

__init__​

write_tile​