Mojo module
mma
Apple Silicon MMA operation struct for TileTensor.
Simdgroup-level, register-owning MMA abstraction following the AMD MmaOp pattern. Each simdgroup (32 threads) instantiates its own MmaOpApple.
Use mma() for interior tiles (caller guarantees in-bounds). Use mmabounded=True for edge tiles (zero-fills OOB elements). The kernel should check once per simdgroup, not per load.
Structs
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!