IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo trait

TileLoader

DRAMโ†’LDS DMA loader contract for tile_rows ร— tile_cols half-tiles.

Implementations cooperate as a warp group to fill an SMEM half-tile via buffer_load_*_lds. The kernel walks coords in (m_offset, k_offset) GEMM-space; the loader translates them to physical addresses internally. Conformers must be TrivialRegisterPassable so the kernel can pass them by value through closures.

Two conformers ship today:

  • TileLoaderLDS โ€” linear 2D source. Used by matmul A/B operands and by conv's B (filter) operand. The address math is addr = (m_offset * stride) + k_offset.
  • TileLoaderLDSIm2col โ€” NHWC + in-line im2col. Used by conv's A (input) operand. The address math decomposes m_offset โ†’ (n, h_out, w_out) and k_offset โ†’ (kh, kw, c) at load time; conv geometry (R, S, H, W, stride, dilation, pad) is loader-internal state.

The kernel doesn't have to know which loader is in use โ€” it just advances (m_offset, k_offset) through the K-loop. That's the point of the trait: the conv body and matmul body can share everything except which loader they instantiate.

Implemented traitsโ€‹

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable

comptime membersโ€‹

dtypeโ€‹

comptime dtype

tile_colsโ€‹

comptime tile_cols

tile_rowsโ€‹

comptime tile_rows

Required methodsโ€‹

__init__โ€‹

__init__(out self: _Self, *, copy: _Self)

Create a new instance of the value by copying an existing one.

Args:

  • โ€‹copy (_Self): The value to copy.

Returns:

_Self

__init__(out self: _Self, *, deinit take: _Self)

Create a new instance of the value by moving the value of another.

Args:

  • โ€‹take (_Self): The value to move.

Returns:

_Self

load_tileโ€‹

load_tile(self: _Self, dst: TileTensor[_Self.dtype, address_space=AddressSpace.SHARED], m_offset: Int, k_offset: Int)

Loads a half-tile from global memory into the SMEM dst.

Issues num_iterations buffer_load_*_lds bursts (per lane) that together fill the tile_rows ร— tile_cols SMEM half-tile. Each iteration costs one vmcnt-tracked outstanding load per lane โ€” the 4-wave software pipeline relies on this exact accounting.

Args:

  • โ€‹dst (TileTensor[_Self.dtype, address_space=AddressSpace.SHARED]): Destination half-tile in SHARED address space, sized tile_rows ร— tile_cols.
  • โ€‹m_offset (Int): Row offset (M dimension) of the sub-tile origin in GEMM space.
  • โ€‹k_offset (Int): Column offset (K dimension) of the sub-tile origin in GEMM space.

Provided methodsโ€‹

copyโ€‹

copy(self: _Self) -> _Self

Explicitly construct a copy of self, a convenience method for Self(copy=self) when the type is inconvenient to write out.

Returns:

_Self: A copy of this value.