Mojo module

tensor_ops

This module provides tensor core operations and utilities for GPU computation.

The module includes functions for:

Tensor core based reductions (tc_reduce) supporting various data types and SIMD widths
GEVM (General Matrix-Vector Multiplication) reductions using tensor cores
Efficient warp-level reductions leveraging tensor core operations

The tensor core operations are optimized for NVIDIA GPUs and support different data types including float32, float16, and bfloat16. The module provides both scalar and vector variants of reduction operations with different SIMD widths for maximum performance.

Key functions:

tc_reduce: Main tensor core reduction function supporting various types and widths
tc_reduce_gevm_8x: 8x GEVM reduction using tensor cores
tc_reduce_gevm_4x: 4x GEVM reduction using tensor cores

Note: Most operations require NVIDIA GPUs with tensor core support. Operations are optimized for warp-level execution.

Functions

tc_reduce: Performs tensor core based reduction on a SIMD vector.
tc_reduce_gevm_4x: Performs a 4x GEVM reduction using tensor cores.
tc_reduce_gevm_8x: Performs an 8x GEVM reduction using tensor cores.

Functions​

Functions