Mojo module
matmul
SM100 Matmul CPU entry points - TMA setup and kernel launch wrappers.
This module contains the CPU-side code for SM100 matrix multiplication:
- TMA descriptor creation
- Kernel instantiation and launch via ctx.enqueue_function
All GPU code (kernel structs, runtime functions) is in matmul_kernels.mojo.
Functionsβ
- β
blackwell_batched_matmul_tma_umma_warp_specialized: Public entry point for batched SM100 BF16 matmul. - β
blackwell_matmul_tma_umma_warp_specialized: Public entry point for SM100 matmul (non-batched, rank-2 inputs). - β
matmul_sm100_fallback:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!