Skip to main content

/

Mojo module

block_scaled_matmul_kernels

Block-scaled SM100 matmul kernel for MXFP8 matrix multiplication.

Warp-specialized architecture:

Scheduler: CLC-based tile distribution
TMA Load: Async loads for A, B, and their scaling factors (SFA, SFB)
MMA: Block-scaled tensor core ops with TMEM accumulators
Epilogue: TMEM → SMEM → GMEM output pipeline

`comptime` values

`UnsafePointer`

comptime UnsafePointer = LegacyUnsafePointer[?, address_space=?, origin=?]

Structs

BlackwellBlockScaledMatmulKernel: SM100 block-scaled GEMM kernel for MXFP8 (FP8 with microscaling).
BlockScaledKernelContext: Per-CTA state: election flags, coordinates, multicast masks, TMEM offsets.

comptime values
- UnsafePointer
Structs

View source

View source

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!