Skip to main content

Mojo module

matmul_kernels

SM100 Matmul Kernel Structs - GPU kernel entry points and helpers.

This module contains the GPU kernel structs for SM100 matmul:

  • WarpRole: Warp specialization roles (MMA, Load, Scheduler, Epilogue)
  • KernelContext: Common kernel state (election vars, CTA coords, masks)
  • B200MatmulSmem: Shared memory layout for the kernel
  • BlackwellMatmulSM100Kernel: Main kernel struct with run() and run_splitk()
  • BlackwellMatmulSM100FallbackKernel: Simple fallback kernel
  • consumer_main_loop: MMA consumer loop (for external callers)

Output pipeline functions (copy_accum_to_gmem, multi_stage_store_C) are in matmul_output.mojo.

The kernel implements a warp-specialized architecture:

  • Scheduler warp: CLC-based tile scheduling
  • TMA Load warp: Async memory transfers
  • MMA warp: Tensor core operations with TMEM accumulators
  • Epilogue warps: Output from TMEM to GMEM (see matmul_output.mojo)

comptime values

RLayout32Bits

comptime RLayout32Bits[layout: Layout] = RuntimeLayout[layout, element_type=DType.uint32, linear_idx_type=DType.uint32]

Parameters

Structs

Functions

Was this page helpful?