Mojo module
tensor_ops
This module provides tensor core operations and utilities for GPU computation.
The module includes functions for:
- Tensor core based reductions (tc_reduce) supporting various data types and SIMD widths
- GEVM (General Matrix-Vector Multiplication) reductions using tensor cores
- Efficient warp-level reductions leveraging tensor core operations
The tensor core operations are optimized for NVIDIA GPUs and support different data types including float32, float16, and bfloat16. The module provides both scalar and vector variants of reduction operations with different SIMD widths for maximum performance.
Key functions:
- tc_reduce: Main tensor core reduction function supporting various types and widths
- tc_reduce_gevm_8x: 8x GEVM reduction using tensor cores
- tc_reduce_gevm_4x: 4x GEVM reduction using tensor cores
Note: Most operations require NVIDIA GPUs with tensor core support. Operations are optimized for warp-level execution.
Functions
-
tc_reduce
: Performs tensor core based reduction on a SIMD vector. -
tc_reduce_gevm_4x
: Performs a 4x GEVM reduction using tensor cores. -
tc_reduce_gevm_8x
: Performs an 8x GEVM reduction using tensor cores.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!