Mojo module
mma_apple
Apple Silicon MMA implementation for matrix multiply-accumulate operations.
This module provides MMA implementations for Apple M5 GPUs using the simdgroup_matrix hardware instructions (Metal 4.0 / AIR 2.8.0).
Supported operations:
- Float multiply-accumulate: {F16, BF16, F32} inputs, F32 accumulator
- Integer widening multiply-accumulate: {I8, U8} inputs, I32/U32 accumulator
Functionsโ
- โ
apple_mma_load: Loads a 16x16 matrix fragment for the current simdgroup thread. - โ
apple_mma_store: Stores a 16x16 matrix fragment from the current simdgroup thread.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!