Mojo numeric types reference
Mojo represents numbers in two ways. Int is a general-purpose integer
that matches the hardware's native word size. Other numeric types are built
on SIMD: Float32, Int64, UInt8, and even UInt.
SIMD
SIMD stands for "Single Instruction, Multiple Data". It
lets the CPU operate on multiple values at once using a single
instruction.
A SIMD value stores one or more values of the same type in a
fixed-size vector. The number of values is called the width,
and it must be a power of two.
The width is part of the type. For example,
SIMD[DType.float32, 4] is a vector of four 32-bit floats.
SIMD[DType.int8, 16] is a vector of sixteen 8-bit integers.
When a SIMD value holds one value, it behaves like a scalar.
When it holds several, operations apply to all values at once:
var v = SIMD[DType.float32, 4](1.0, 2.0, 3.0, 4.0)
var doubled = v * 2.0 # All four elements doubled
print(doubled) # [2.0, 4.0, 6.0, 8.0]Modern CPUs can process 4, 8, 16, or more values in parallel with SIMD, which can significantly improve performance over scalar operations.
Element access
Read and write individual elements by index ("lane"):
v[0] # Read element 0 → Scalar[DType.float32]
v[0] = 5.0 # Write element 0Operations
Arithmetic, comparison, and bitwise operations apply to all elements at once:
var a = SIMD[DType.float32, 4](1.0, 2.0, 3.0, 4.0)
var b = SIMD[DType.float32, 4](5.0, 6.0, 7.0, 8.0)
var sum = a + b # [6.0, 8.0, 10.0, 12.0]
var prod = a * b # [5.0, 12.0, 21.0, 32.0]Reductions combine all elements into a single value:
a.reduce_add() # 10.0
a.reduce_max() # 4.0
a.reduce_min() # 1.0Casting converts each element to a different numeric type. The number of elements stays the same, even when the target type is wider or narrower:
var a = SIMD[DType.float32, 4](1.0, 2.0, 3.0, 4.0)
var ints = a.cast[DType.int32]() # [1, 2, 3, 4]
var wide = a.cast[DType.float64]() # 4 × Float64
var tiny = a.cast[DType.float16]() # 4 × Float16Clamping restricts elements to a range. Both bounds are inclusive, so the result can equal the bounds:
# max(min(self, upper_bound), lower_bound)
a.clamp(1.5, 3.5) # [1.5, 2.0, 3.0, 3.5]min() and max() are free functions, not methods:
min(a, b) # Element-wise minimum
max(a, b) # Element-wise maximumScalar
A SIMD with one element is called a Scalar. Every fixed-width numeric name
in Mojo is a Scalar alias:
# These are all the same type
var a: Scalar[DType.float32] = 3.14
var b: Float32 = 3.14
var c: SIMD[DType.float32, 1] = 3.14When you write Float32, you're writing Scalar[DType.float32], which is
SIMD[DType.float32, 1].
DType specifications
DType names the kind of values stored in a SIMD vector, such as
float32, int64, or uint8. A DType doesn't store data. It tells
SIMD how to interpret each element and which operations to use:
# DType selects a number kind, such as 32-bit float or 8-bit integer
var x: SIMD[DType.float32, 4] = ... # four 32-bit floats
var y: SIMD[DType.int8, 16] = ... # sixteen 8-bit intsUse DType to write functions that work across numeric kinds:
# Double a value. The cast is required because the generic type
# parameter can't be used directly with the literal `2`.
def double[T: DType](x: Scalar[T]) -> Scalar[T]:
return x * UInt8(2).cast[T]()Integer DType specifications
| Signed | Width | Unsigned | Width |
|---|---|---|---|
DType.int8 | 8-bit | DType.uint8 | 8-bit |
DType.int16 | 16-bit | DType.uint16 | 16-bit |
DType.int32 | 32-bit | DType.uint32 | 32-bit |
DType.int64 | 64-bit | DType.uint64 | 64-bit |
DType.int128 | 128-bit | DType.uint128 | 128-bit |
DType.int256 | 256-bit | DType.uint256 | 256-bit |
DType.index | Machine | DType.uint | Machine |
Floating-point DType specifications
| Value | Selects |
|---|---|
DType.float16 | 16-bit IEEE half |
DType.bfloat16 | 16-bit brain float |
DType.float32 | 32-bit IEEE single |
DType.float64 | 64-bit IEEE double |
DType.float8_e4m3fn | 8-bit (4-exp, 3-mantissa) |
DType.float8_e4m3fnuz | 8-bit, unsigned zero |
DType.float8_e5m2 | 8-bit (5-exp, 2-mantissa) |
DType.float8_e5m2fnuz | 8-bit, unsigned zero |
DType.float8_e8m0fnu | 8-bit (8-exp, no mantissa) |
DType.float4_e2m1fn | 4-bit (2-exp, 1-mantissa) |
Other DType specifications
| Value | Selects |
|---|---|
DType.bool | Boolean (1-bit) |
DType.invalid | no valid DType has been set |
Integers
The unsized Int type
Int is Mojo's default integer. When you write var x = 42,
you assign an Int. It's the type behind loop counters, collection
indices, and len() results:
from std.reflection import get_type_name
def main():
var a: Int = 42
comptime a_type = get_type_name[type_of(a)]()
print("a:", a_type) # a: IntInt matches the hardware's native word size. It isn't built on SIMD.
Under the hood it wraps the machine's index register directly, which is why
it's the natural choice for counting and addressing.
Int is 64-bit on most platforms today, but that isn't guaranteed. Code
that depends on a specific width should use a sized type.
Int conforms to
Intable, Writable, Hashable, Comparable, and
TrivialRegisterPassable.
Integer-type bounds and bit width
Int exposes its bounds and bit width as compile-time constants:
| Constant | Value |
|---|---|
Int.BITWIDTH | System word size (typically 64) |
Int.MAX | Maximum representable value |
Int.MIN | Minimum representable value |
print(Int.BITWIDTH) # 64 on most platforms
print(Int.MIN) # -9223372036854775808
print(Int.MAX) # 9223372036854775807All integer types offer MAX and MIN as well:
| Constant | Value |
|---|---|
<Integer-Type>.MAX | Maximum representable value |
<Integer-Type>.MIN | Minimum representable value |
For example:
print(UInt.MIN) # 0
print(UInt.MAX) # 18446744073709551615
print(UInt8.MAX) # 255
print(Int8.MIN) # -128
print(UInt32.MAX) # 4294967295
print(Int32.MIN) # -2147483648
print(SIMD[DType.int16, 1].MIN) # -32768UInt
UInt is a machine-width unsigned integer. Unlike Int, it's built on
SIMD:
from std.reflection import get_type_name
def main():
var b: UInt = 42
comptime b_type = get_type_name[type_of(b)]()
print("b:", b_type) # b: SIMD[DType.uint, 1]Sized integer types
Sized integer types have a declared width that stays the same on every platform.
| Signed | Width | Unsigned | Width |
|---|---|---|---|
Int8 | 8-bit | UInt8 | 8-bit |
Int16 | 16-bit | UInt16 | 16-bit |
Int32 | 32-bit | UInt32 | 32-bit |
Int64 | 64-bit | UInt64 | 64-bit |
Int128 | 128-bit | UInt128 | 128-bit |
Int256 | 256-bit | UInt256 | 256-bit |
Each is an alias for a one-element SIMD. For example,
Int32 is Scalar[DType.int32], which is
SIMD[DType.int32, 1]. The unsigned types follow the same
pattern.
Because these are built on SIMD, they share its traits:
TrivialRegisterPassable, Hashable, Comparable,
Writable.
Using sized vs unsized integers:
-
Use
IntandUIntfor counts, indices, loop bounds, and general-purpose math. It's what the standard library expects and returns. -
Use sized integers when width matters: file layouts, pixel data, hardware registers, or any context where the number of bits is part of the contract.
-
Use named types for scalar work and
SIMDwhen you need vectors.
var general = 42 # Int (machine width)
var small: UInt8 = 255
var large: Int64 = -9_000_000_000
var pair = SIMD[DType.uint32, 2](10, 20) # a 2-element vectorByte
Byte is another name for UInt8:
var buf: List[Byte] = [0x48, 0x65, 0x6C, 0x6C, 0x6F]Use Byte when the data represents raw bytes rather than
small numbers. It's the element type used in many I/O and
memory interfaces.
Floating point types
Mojo does not provide a Float type analogous to Int. Instead it
provides numerous fixed-width floating-point types. Each is an alias for a
one-element SIMD:
| Type | Bits | Standard | What it is |
|---|---|---|---|
Float16 | 16 | IEEE 754 binary16 | Scalar[DType.float16] |
Float32 | 32 | IEEE 754 binary32 | Scalar[DType.float32] |
Float64 | 64 | IEEE 754 binary64 | Scalar[DType.float64] |
BFloat16 | 16 | Brain float | Scalar[DType.bfloat16] |
Float4_e2m1fn | 4 | OCP MX | Scalar[DType.float4_e2m1fn] |
Float8_e3m4 | 8 | -- | Scalar[DType.float8_e3m4] |
Float8_e4m3fn | 8 | OFP8 | Scalar[DType.float8_e4m3fn] |
Float8_e4m3fnuz | 8 | -- | Scalar[DType.float8_e4m3fnuz] |
Float8_e5m2 | 8 | OFP8 | Scalar[DType.float8_e5m2] |
Float8_e5m2fnuz | 8 | -- | Scalar[DType.float8_e5m2fnuz] |
Float8_e8m0fnu | 8 | OFP8 §5.4 | Scalar[DType.float8_e8m0fnu] |
FloatLiteral | arbitrary | -- | Compile-time only. Materializes to Float64. |
Float16
16-bit IEEE 754 half-precision. The motivation is throughput and
memory bandwidth: half the storage of Float32 means twice the
values fit in registers and cache, and GPU tensor cores process it
at higher throughput. 1 sign bit, 5 exponent bits, 10 mantissa bits.
The narrower exponent range limits dynamic range to roughly ±65504.
Values beyond that overflow to infinity; very small values underflow
to zero. This makes Float16 workable for inference but less ideal
for training, where gradients can span many orders of magnitude. Use
BFloat16 for training instead.
Float16 is natively accelerated on GPUs. On CPU, it requires ARM FP16
extension or Intel AVX-512 FP16. Other CPUs fall back to software emulation.
Float32
32-bit IEEE 754 single-precision. 23 mantissa bits give roughly 7 significant decimal digits; 8 exponent bits cover a range from roughly 1e-38 to 3.4e38. 1 sign bit, 8 exponent bits, 23 mantissa bits.
Float32 is natively accelerated on all GPU and CPU architectures. Use for
general numeric work and GPU computation.
Float64
64-bit IEEE 754 double-precision. Use when 7 significant decimal digits aren't enough: scientific simulations, financial calculations, or accumulated sums where rounding errors compound. 52 mantissa bits give roughly 15-16 significant decimal digits. 1 sign bit, 11 exponent bits, 52 mantissa bits.
BFloat16
16-bit brain floating-point developed by Google Brain for deep learning. 1 sign bit, 8 exponent bits, 7 mantissa bits.
Google Brain designed it to solve a specific problem with Float16
in training: Float16's 5 exponent bits create a dynamic range too
narrow for neural networks. Gradients overflow and underflow.
BFloat16 matches Float32's 8 exponent bits exactly, so values
stay in range throughout forward and backward passes.
The matching exponent range also makes Float32/BFloat16
conversion cheap: just truncate or extend the mantissa, no remapping.
This makes mixed-precision training feasible: compute in BFloat16
for speed and memory savings, keep optimizer state in Float32 for
precision. That combination drove its wide adoption as a training
format.
Use it for ML training and inference on supported hardware. The 7-bit mantissa is too imprecise for scientific or financial work.
BFloat16 is not supported on all platforms. It's currently unavailable on
Apple Silicon. Natively accelerated on NVIDIA Ampere (A100) and later,
AMD MI300X and later, and Intel CPUs with AMX or AVX-512 BF16
(Sapphire Rapids and later).
Low-precision types
Fewer bits per value means more values per register, less memory bandwidth, and higher throughput on specialized hardware. You trade mantissa precision for the ability to fit larger models or larger batches on the same silicon. These formats follow the OCP Microscaling Formats (MX) and OFP8 specifications.
There is no single Float8 type in Mojo. It's a colloquial umbrella for
the six 8-bit floating-point variants: Float8_e3m4, Float8_e4m3fn,
Float8_e4m3fnuz, Float8_e5m2, Float8_e5m2fnuz, and Float8_e8m0fnu.
Each is a distinct Scalar alias with its own exponent/mantissa layout and
set of supported operations.
Float8 formats are used in machine learning workloads where memory
bandwidth matters more than precision. These types require GPU hardware for
efficient execution.
Float8 types can't convert to or from any integer type on any
platform, including Bool. They only convert between floating-point
types: Float16, Float32, Float64, BFloat16, and other
supported Float8 variants.
Floating point naming conventions
The suffixes encode special properties of each format:
fn: finite -- no infinity or negative infinity encodingsuz: unsigned zero -- no negative zero encodingfnu: finite, no sign, unsigned zero
The name encodes the layout: e4m3 means 4 exponent bits and
3 mantissa bits. fn means no infinities, and uz means
unsigned zero.
For example, Float4_e2m1fn is a 4-bit format with 2 exponent
bits and 1 mantissa bit, defined by the Open Compute MX
specification.
Hardware requirements
Support varies significantly by type and operation. None of these types support arithmetic at runtime on CPU.
Arithmetic support (tested on ARM CPU, NVPTX sm_90a, AMDGCN gfx942):
| Type | Comptime | CPU | NVPTX | AMDGCN |
|---|---|---|---|---|
Float8_e4m3fn | ✅ | ❌ | ✅ | ❌ |
Float8_e4m3fnuz | ✅ | ❌ | ❌ | ❌ |
Float8_e5m2 | ✅ | ❌ | ✅ | ❌ |
Float8_e5m2fnuz | ✅ | ❌ | ❌ | ❌ |
Float8_e3m4 | ❌ | ❌ | ❌ | ❌ |
NVPTX support for Float8_e4m3fn and Float8_e5m2 is emulated by
the compiler: operands are upconverted to a wider type, the operation
runs in that wider type, and the result is downconverted back. There
are no native fp8 arithmetic instructions.
-
Float8_e3m4has no arithmetic support at any stage, including comptime. Most of its conversions work only at comptime. -
Float4_e2m1fnrequires NVIDIA Blackwell (B200) or later. -
Float32andFloat64are the portable alternatives for CPU and cross-platform code.
IEEE 754 special values
IEEE 754 floating-point types support special values:
| Value | Meaning |
|---|---|
inf | Positive infinity |
-inf | Negative infinity |
nan | Not a number |
-0.0 | Negative zero |
Access these via SIMD constants:
var x = Float32.MAX # largest value
var y = Float32.MIN # smallest value
var z = Float32.MAX_FINITE # largest finite value
var w = Float32.MIN_FINITE # smallest (most negative) finite valueMAX and MIN may be infinite for floating-point types.
MAX_FINITE and MIN_FINITE give the largest and smallest
representable finite values.
Low-precision formats marked fn (finite) don't have infinity
encodings. Formats marked uz (unsigned zero) don't have negative
zero.
Floating point precision
Floating-point arithmetic introduces rounding errors. Two values
that look equal after computation may differ by a tiny amount.
Comparing with == can give unexpected results:
# Compile-time: exact result
comptime exact = 3.0 * (4.0 / 3.0 - 1.0)
# Force runtime: rounding error appears
var three = 3.0
var finite = three * (4.0 / three - 1.0)
print(exact, finite)
# 1.0 0.99999999999999978
print(exact == finite) # FalseFor approximate comparisons, check whether the difference is within
an acceptable tolerance with std.math's is_close().
Numeric literals
Mojo has two compile-time literal types: IntLiteral and
FloatLiteral. They support arbitrary precision and exist
only during compilation.
IntLiteral
When you write a bare integer like 42, its type is
IntLiteral. It doesn't become a concrete type until it's
used in a context that requires one:
var a: Int = 42 # Becomes Int
var b: Int8 = 42 # Becomes Int8
var c: Float32 = 42 # Becomes Float32
var d: UInt64 = 1_000_000 # Becomes UInt64IntLiteral is arbitrary-precision at compile time. It has no fixed bit
width, so compile-time calculations won't overflow or lose precision. At
runtime, IntLiteral values materialize to Int:
# Compile-time: arbitrary precision, no overflow
comptime big = 2 ** 200
# Runtime: materializes to Int (word-sized)
var x = 42 # IntLiteral 42 materializes to IntIntLiteral supports all arithmetic and comparison operators at
compile time.
FloatLiteral
When you write a decimal constant like 3.14, its type is
FloatLiteral. It doesn't become a concrete type until it's
used in a context that requires one:
var x: Float32 = 3.14 # Becomes Float32
var y: Float64 = 3.14 # Becomes Float64
var z: BFloat16 = 0.5 # Becomes BFloat16FloatLiteral provides compile-time constants for special values:
| Constant | Value |
|---|---|
FloatLiteral.nan | Not a number |
FloatLiteral.infinity | Positive infinity |
FloatLiteral.negative_infinity | Negative infinity |
FloatLiteral.negative_zero | Negative zero |
Use is_nan() and is_neg_zero() to test for these values, since
nan == nan is False and negative_zero == 0.0 is True.
Literals in expressions
Literals adapt to the types around them. When a literal appears next to a typed value, it takes on that value's type:
var x = Float32(1.0)
var y = x * 0.5 # 0.5 becomes Float32
var z = x + 2 # 2 becomes Float32This isn't implicit conversion. The literal doesn't have a runtime type yet. It becomes whatever type the context requires.
Variables have a fixed type and never convert implicitly.
Explicit conversions
Converting between numeric types always requires an explicit constructor or cast. Mojo does not perform implicit numeric conversions between variables:
var i = 42 # Int
var f = Float32(i) # Int → Float32
var u = UInt64(i) # Int → UInt64
var narrow = Int8(i) # Int → Int8Between SIMD-based types, use .cast[]:
var a = Float32(3.14)
var b = a.cast[DType.int32]() # Float32 → Int32
var c = a.cast[DType.float64]() # Float32 → Float64Between Int and SIMD-based types, use constructors:
var i = 42 # Int
var s = Int64(i) # Int → Int64
var back = Int(s) # Int64 → IntWhy conversions are explicit
Implicit numeric conversions can hide precision loss and sign
changes. For example, Int64(-1) becoming
UInt64(18446744073709551615) is a bug, not a convenience.
Mojo requires an explicit conversion so the intent is clear.
Literals are the exception. A literal like 42 can become
Float32(42.0) because the compiler performs the conversion
at compile time and can guarantee it is exact.
Variables are different. A value like x: Int = 300 becoming
an Int8 would silently lose data, so Mojo requires you to
write the conversion explicitly.
Sharp edges
Int width is platform-dependent
Int is 64-bit on most platforms today, but it's defined as
machine width. Code that assumes 64-bit Int will break on
32-bit targets. Use Int64 when you need a fixed width.
Integer arithmetic wraps on overflow
Integer arithmetic wraps on overflow using two's complement:
- Signed overflow wraps into the negative range. Adding
1toInt8value127produces-128. - Unsigned overflow wraps to zero. Adding
1toUInt8value255produces0.
Mojo doesn't trap on overflow. If you need overflow detection, check the operands before the operation.
var x = Int8(127)
var y = x + Int8(1) # -128 (wraps)Float-to-int truncates toward zero
var x = Int(Float32(3.9)) # 3, not 4
var y = Int(Float32(-3.9)) # -3, not -4NaN comparisons always return False
This includes NaN == NaN. It affects SIMD masks and
conditional selection:
var x = Float32.MAX * 2.0 # inf
var nan = x - x # NaN
print(nan == nan) # False128-bit and 256-bit integers are software-emulated
Int128, Int256, UInt128, and UInt256 exist but have
limited hardware support on most platforms. Avoid them in
performance-critical code without benchmarking.
Float8 types require GPU hardware
The Float8 variants are designed for ML workloads on GPUs
with native support. On CPUs, operations on these types may
be emulated or unavailable.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!