For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

common

Shared NVIDIA GPU attention primitives used by both SM90 and SM100 kernels.

This module hosts helpers that are not architecture-specific so that neither sm90/ nor sm100/ has to import from the other. It currently provides:

elect(): single-lane election via the elect.sync PTX instruction.

`comptime` values

`ImmutTileTensor1D`

comptime ImmutTileTensor1D[dtype: DType] = TileTensor[dtype, Layout[*?, *?], ImmutAnyOrigin]

Parameters

dtype (DType):

`KVTMATile`

comptime KVTMATile[dtype: DType, swizzle_mode: TensorMapSwizzle, *, BN: Int, BK: Int] = TMATensorTile[dtype, 3, _padded_shape[3, dtype, IndexList(BN, 1, BK, __list_literal__=NoneType(None)), swizzle_mode](), _ragged_shape[3, dtype, IndexList(BN, 1, BK, __list_literal__=NoneType(None)), swizzle_mode]()]

Parameters

dtype (DType):
swizzle_mode (TensorMapSwizzle):
BN (Int):
BK (Int):

`QTMATile`

comptime QTMATile[dtype: DType, swizzle_mode: TensorMapSwizzle, *, BM: Int, depth: Int, group: Int, decoding: Bool, fuse_gqa: Bool = False, num_qk_stages: Int = 1] = TMATensorTile[dtype, 4 if decoding or fuse_gqa else 3, _padded_shape[4 if decoding or fuse_gqa else 3, dtype, q_smem_shape[dtype, swizzle_mode, BM=BM, group=group, depth=depth, decoding=decoding, fuse_gqa=fuse_gqa, num_qk_stages=num_qk_stages](), swizzle_mode](), _ragged_shape[4 if decoding or fuse_gqa else 3, dtype, q_smem_shape[dtype, swizzle_mode, BM=BM, group=group, depth=depth, decoding=decoding, fuse_gqa=fuse_gqa, num_qk_stages=num_qk_stages](), swizzle_mode]()]

Parameters

dtype (DType):
swizzle_mode (TensorMapSwizzle):
BM (Int):
depth (Int):
group (Int):
decoding (Bool):
fuse_gqa (Bool):
num_qk_stages (Int):

Structs

MHAPosition: Position of the MHA-kernel. When decoding=False, q_head_stride == q_num_heads. When decoding=True, q_head_stride == 1.
NonNullPointer:
NullPointer:
Pack:
PositionSummary:

Traits

OptionalPointer:

Functions

elect:
get_seq_info:
kv_coord:
output_reg_to_smem_st_matrix:
q_coord: Returns the coordinates for a tma load on the Q matrix. This load can be 3D, 4D, or 5D.
q_gmem_shape:
q_smem_shape:
q_tma:

comptime values​

ImmutTileTensor1D​

Parameters​

KVTMATile​

Parameters​

QTMATile​

Parameters​

Structs​

Traits​

Functions​

`comptime` values

`ImmutTileTensor1D`

Parameters

`KVTMATile`

Parameters

`QTMATile`

Parameters

Structs

Traits

Functions