Mojo module
blockwise_fp8_matmul
CPU entry points for blockwise FP8 SM100 matmul.
Creates TMA descriptors for A, B, C and A-scales, then launches the warp-specialized blockwise FP8 kernel with register-based accumulation.
Functions
-
blockwise_fp8_matmul: Launch blockwise FP8 matmul kernel.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!