For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo trait
TileLoader
DRAMโLDS DMA loader contract for tile_rows ร tile_cols half-tiles.
Implementations cooperate as a warp group to fill an SMEM half-tile
via buffer_load_*_lds. The kernel walks coords in (m_offset, k_offset) GEMM-space; the loader translates them to physical
addresses internally. Conformers must be TrivialRegisterPassable
so the kernel can pass them by value through closures.
Two conformers ship today:
TileLoaderLDSโ linear 2D source. Used by matmul A/B operands and by conv's B (filter) operand. The address math isaddr = (m_offset * stride) + k_offset.TileLoaderLDSIm2colโ NHWC + in-line im2col. Used by conv's A (input) operand. The address math decomposesm_offset โ (n, h_out, w_out)andk_offset โ (kh, kw, c)at load time; conv geometry (R, S, H, W, stride, dilation, pad) is loader-internal state.
The kernel doesn't have to know which loader is in use โ it just
advances (m_offset, k_offset) through the K-loop. That's the
point of the trait: the conv body and matmul body can share
everything except which loader they instantiate.
Implemented traitsโ
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
comptime membersโ
dtypeโ
comptime dtype
tile_colsโ
comptime tile_cols
tile_rowsโ
comptime tile_rows
Required methodsโ
__init__โ
__init__(out self: _Self, *, copy: _Self)
Create a new instance of the value by copying an existing one.
Args:
- โcopy (
_Self): The value to copy.
Returns:
_Self
__init__(out self: _Self, *, deinit take: _Self)
Create a new instance of the value by moving the value of another.
Args:
- โtake (
_Self): The value to move.
Returns:
_Self
load_tileโ
load_tile(self: _Self, dst: TileTensor[_Self.dtype, address_space=AddressSpace.SHARED], m_offset: Int, k_offset: Int)
Loads a half-tile from global memory into the SMEM dst.
Issues num_iterations buffer_load_*_lds bursts (per lane)
that together fill the tile_rows ร tile_cols SMEM half-tile.
Each iteration costs one vmcnt-tracked outstanding load per
lane โ the 4-wave software pipeline relies on this exact
accounting.
Args:
- โdst (
TileTensor[_Self.dtype, address_space=AddressSpace.SHARED]): Destination half-tile in SHARED address space, sizedtile_rows ร tile_cols. - โm_offset (
Int): Row offset (M dimension) of the sub-tile origin in GEMM space. - โk_offset (
Int): Column offset (K dimension) of the sub-tile origin in GEMM space.
Provided methodsโ
copyโ
copy(self: _Self) -> _Self
Explicitly construct a copy of self, a convenience method for Self(copy=self) when the type is inconvenient to write out.
Returns:
_Self: A copy of this value.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!