Skip to main content

Mojo module

block_scaled_matmul_kernels

Block-scaled SM100 matmul kernel for MXFP8 matrix multiplication.

Warp-specialized architecture:

  • Scheduler: CLC-based tile distribution
  • TMA Load: Async loads for A, B, and their scaling factors (SFA, SFB)
  • MMA: Block-scaled tensor core ops with TMEM accumulators
  • Epilogue: TMEM → SMEM → GMEM output pipeline

comptime values

UnsafePointer

comptime UnsafePointer = LegacyUnsafePointer[?, address_space=?, origin=?]

Structs

Was this page helpful?