Mojo module

matmul_kernels

SM100 Matmul Kernel Structs - GPU kernel entry points and helpers.

This module contains the GPU kernel structs for SM100 matmul:

WarpRole: Warp specialization roles (MMA, Load, Scheduler, Epilogue)
KernelContext: Common kernel state (election vars, CTA coords, masks)
B200MatmulSmem: Shared memory layout for the kernel
BlackwellMatmulSM100Kernel: Main kernel struct with run() and run_splitk()
BlackwellMatmulSM100FallbackKernel: Simple fallback kernel
consumer_main_loop: MMA consumer loop (for external callers)

Output pipeline (TileWriter, copy_accum_to_gmem) is in output_writer.mojo. Low-level epilogue components (TMAStoreExecutor, etc.) are in tile_writer.mojo.

The kernel implements a warp-specialized architecture:

Scheduler warp: CLC-based tile scheduling
TMA Load warp: Async memory transfers
MMA warp: Tensor core operations with TMEM accumulators
Epilogue warps: Output from TMEM to GMEM via TileWriter

`comptime` values

`UnsafePointer`

comptime UnsafePointer = LegacyUnsafePointer[?, address_space=?, origin=?]

Structs

B200MatmulSmem: Shared memory layout for B200 SM100 matrix multiplication kernel.
BlackwellMatmulSM100FallbackKernel: Simple fallback matmul kernel for SM100 (B200).
BlackwellMatmulSM100Kernel: Blackwell SM100 GEMM kernel with warp specialization.
KernelContext: Shared kernel state: election vars, CTA coords, multicast masks, pipeline states.
WarpRole: Warp role identifiers for SM100 warp-specialized kernel.

Functions

consumer_main_loop: DEPRECATED: Legacy MMA consumer loop for external callers.

comptime values​

UnsafePointer​

Structs​

Functions​

`comptime` values

`UnsafePointer`

Structs

Functions