Skip to main content
Log in

Mojo module

tensor_ops

This module provides tensor core operations and utilities for GPU computation.

The module includes functions for:

  • Tensor core based reductions (tc_reduce) supporting various data types and SIMD widths
  • GEVM (General Matrix-Vector Multiplication) reductions using tensor cores
  • Efficient warp-level reductions leveraging tensor core operations

The tensor core operations are optimized for NVIDIA GPUs and support different data types including float32, float16, and bfloat16. The module provides both scalar and vector variants of reduction operations with different SIMD widths for maximum performance.

Key functions:

  • tc_reduce: Main tensor core reduction function supporting various types and widths
  • tc_reduce_gevm_8x: 8x GEVM reduction using tensor cores
  • tc_reduce_gevm_4x: 4x GEVM reduction using tensor cores

Note: Most operations require NVIDIA GPUs with tensor core support. Operations are optimized for warp-level execution.

Functions