For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
kernel_common
Shared kernel components for SM100 warp-specialized matmul kernels.
This module contains common components used by all SM100 matmul kernel variants:
- WarpRole: Warp specialization roles for 4-warp kernels (MMA, Load, Scheduler, Epilogue)
- WarpRole1D1D: Warp specialization roles for 3-warp kernels (MMA, Load, Epilogue)
- KernelContext: Common kernel state (election vars, CTA coords, masks)
- Barrier init helpers: compute_input_consumer_count, init_core_barriers, init_clc_barriers
- _Batched3DLayout / _to_batched_3d: Reshape 2D TileTensor to 3D (batch=1)
comptime valuesβ
MbarPtrβ
comptime MbarPtr = UnsafePointer[SharedMemBarrier, MutAnyOrigin, address_space=AddressSpace.SHARED]
Structsβ
- β
KernelContext: Shared kernel state: election vars, CTA coords, multicast masks, pipeline states. - β
WarpRole: Warp role identifiers for SM100 warp-specialized kernel. - β
WarpRole1D1D: Warp role for 1D-1D kernels with warp specialization.
Functionsβ
- β
compute_accum_barrier_counts: Compute accumulator pipeline barrier arrival counts. - β
compute_clc_barrier_counts: Compute CLC barrier arrival counts. - β
compute_input_consumer_count: Compute input pipeline barrier consumer count. - β
compute_tma_tile_dims: Compute TMA tile dimensions (a_tile_dim0, b_tile_dim0, c_tile_dim0). - β
init_clc_barriers: Initialize CLC full/empty barrier pairs. - β
init_core_barriers: Initialize input, output, and TMEM deallocation barriers.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!