For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

kernel_common

Shared kernel components for SM100 warp-specialized matmul kernels.

This module contains common components used by all SM100 matmul kernel variants:

WarpRole: Warp specialization roles for 4-warp kernels (MMA, Load, Scheduler, Epilogue)
WarpRole1D1D: Warp specialization roles for 3-warp kernels (MMA, Load, Epilogue)
KernelContext: Common kernel state (election vars, CTA coords, masks)
Barrier init helpers: compute_input_consumer_count, init_core_barriers, init_clc_barriers
_Batched3DLayout / _to_batched_3d: Reshape 2D TileTensor to 3D (batch=1)

`comptime` values

`MbarPtr`

comptime MbarPtr = UnsafePointer[SharedMemBarrier, MutUntrackedOrigin, address_space=AddressSpace.SHARED]

Structs

KernelContext: Shared kernel state: election vars, CTA coords, multicast masks, pipeline states.
WarpRole: Warp role identifiers for SM100 warp-specialized kernel.
WarpRole1D1D: Warp role for 1D-1D kernels with warp specialization.

Functions

compute_accum_barrier_counts: Compute accumulator pipeline barrier arrival counts.
compute_clc_barrier_counts: Compute CLC barrier arrival counts.
compute_input_consumer_count: Compute input pipeline barrier consumer count.
compute_tma_tile_dims: Compute TMA tile dimensions (a_tile_dim0, b_tile_dim0, c_tile_dim0).
init_clc_barriers: Initialize CLC full/empty barrier pairs.
init_core_barriers: Initialize input, output, and TMEM deallocation barriers.

comptime values​

MbarPtr​

Structs​

Functions​

`comptime` values

`MbarPtr`

Structs

Functions