Mojo module

simd

Implements SIMD primitives and abstractions.

Provides high-performance SIMD primitives and abstractions for vectorized computation in Mojo. It enables efficient data-parallel operations by leveraging hardware vector processing units across different architectures.

Key Features:

Architecture-agnostic SIMD abstractions with automatic hardware detection
Optimized vector operations for common numerical computations
Explicit control over vectorization strategies and memory layouts
Zero-cost abstractions that compile to efficient machine code
Support for different vector widths and element types

Primary Components:

Vector types: Strongly-typed vector containers with element-wise operations
SIMD intrinsics: Low-level access to hardware SIMD instructions
Vectorized algorithms: Common algorithms optimized for SIMD execution
Memory utilities: Aligned memory allocation and vector load/store operations

Performance Considerations:

Vector width selection should match target hardware capabilities
Memory alignment affects load/store performance
Data layout transformations may be necessary for optimal vectorization

Integration: This module is designed to work seamlessly with other Mojo numerical computing components, including tensor operations, linear algebra routines, and domain-specific libraries for machine learning and scientific computing.

Aliases

`BFloat16`

comptime BFloat16 = BFloat16

Represents a 16-bit brain floating point value.

`Byte`

comptime Byte = UInt8

Represents a byte (backed by an 8-bit unsigned integer).

`Float16`

comptime Float16 = Float16

Represents a 16-bit floating point value.

`Float32`

comptime Float32 = Float32

Represents a 32-bit floating point value.

`Float4_e2m1fn`

comptime Float4_e2m1fn = Float4_e2m1fn

Represents a 4-bit e2m1 floating point format, encoded as s.ee.m and defined by the Open Compute MX Format Specification:

(s)ign: 1 bit
(e)xponent: 2 bits
(m)antissa: 1 bits
exponent_bias: 1

`Float64`

comptime Float64 = Float64

Represents a 64-bit floating point value.

`Float8_e4m3fn`

comptime Float8_e4m3fn = Float8_e4m3fn

Represents the E4M3 floating point format defined in the OFP8 standard.

This type is named differently across libraries and vendors, for example:

Mojo, PyTorch, JAX, and LLVM refer to it as e4m3fn.
OCP, NVIDIA CUDA, and AMD ROCm refer to it as e4m3.

In these contexts, they are all referring to the same finite type specified in the OFP8 standard above, encoded as seeeemmm:

(s)ign: 1 bit
(e)xponent: 4 bits
(m)antissa: 3 bits
exponent bias: 7
nan: 01111111, 11111111
-0: 10000000
fn: finite (no inf or -inf encodings)

`Float8_e4m3fnuz`

comptime Float8_e4m3fnuz = Float8_e4m3fnuz

Represents an 8-bit e4m3fnuz floating point format, encoded as seeeemmm: - (s)ign: 1 bit - (e)xponent: 4 bits - (m)antissa: 3 bits - exponent bias: 8 - nan: 10000000 - fn: finite (no inf or -inf encodings) - uz: unsigned zero (no -0 encoding)

`Float8_e5m2`

comptime Float8_e5m2 = Float8_e5m2

Represents the 8-bit E5M2 floating point format from the OFP8 standard, encoded as seeeeemm: - (s)ign: 1 bit - (e)xponent: 5 bits - (m)antissa: 2 bits - exponent bias: 15 - nan: {0,1}11111{01,10,11} - inf: 01111100 - -inf: 11111100 - -0: 10000000

`Float8_e5m2fnuz`

comptime Float8_e5m2fnuz = Float8_e5m2fnuz

Represents an 8-bit floating point format, encoded as seeeeemm: - (s)ign: 1 bit - (e)xponent: 5 bits - (m)antissa: 2 bits - exponent bias: 16 - nan: 10000000 - fn: finite (no inf or -inf encodings) - uz: unsigned zero (no -0 encoding)

`Int128`

comptime Int128 = Int128

Represents a 128-bit signed scalar integer.

`Int16`

comptime Int16 = Int16

Represents a 16-bit signed scalar integer.

`Int256`

comptime Int256 = Int256

Represents a 256-bit signed scalar integer.

`Int32`

comptime Int32 = Int32

Represents a 32-bit signed scalar integer.

`Int64`

comptime Int64 = Int64

Represents a 64-bit signed scalar integer.

`Int8`

comptime Int8 = Int8

Represents an 8-bit signed scalar integer.

`Scalar`

comptime Scalar = Scalar[?]

Represents a scalar dtype.

`U8x16`

comptime U8x16 = SIMD[DType.uint8, 16]

`UInt128`

comptime UInt128 = UInt128

Represents a 128-bit unsigned scalar integer.

`UInt16`

comptime UInt16 = UInt16

Represents a 16-bit unsigned scalar integer.

`UInt256`

comptime UInt256 = UInt256

Represents a 256-bit unsigned scalar integer.

`UInt32`

comptime UInt32 = UInt32

Represents a 32-bit unsigned scalar integer.

`UInt64`

comptime UInt64 = UInt64

Represents a 64-bit unsigned scalar integer.

`UInt8`

comptime UInt8 = UInt8

Represents an 8-bit unsigned scalar integer.

Structs

FastMathFlag: Flags for controlling fast-math optimizations in floating-point operations.
SIMD: Represents a vector type that leverages hardware acceleration to process multiple data elements with a single operation.

Aliases​

BFloat16​

Byte​

Float16​

Float32​

Float4_e2m1fn​

Float64​

Float8_e4m3fn​

Float8_e4m3fnuz​

Float8_e5m2​

Float8_e5m2fnuz​

Int128​

Int16​

Int256​

Int32​

Int64​

Int8​

Scalar​

U8x16​

UInt128​

UInt16​

UInt256​

UInt32​

UInt64​

UInt8​

Structs​