Unsafe pointers
The UnsafePointer
type
creates an indirect reference to a location in memory.
You can use an UnsafePointer
to dynamically allocate and free memory, or to
point to memory allocated by some other piece of code. You can use these
pointers to write code that interacts with low-level interfaces, to interface
with other programming languages, or to build certain kinds of data structures.
But as the name suggests, they're inherently unsafe. For example, when using
unsafe pointers, you're responsible for ensuring that memory gets allocated and
freed correctly.
In addition to unsafe pointers, Mojo supports a safe
Reference
type. See
UnsafePointer
and Reference
for a brief
comparison of the types.
What is a pointer?
An UnsafePointer
is a type that holds an address to memory. You can store
and retrieve values in that memory. The UnsafePointer
type is generic—it can
point to any type of value, and the value type is specified as a parameter. The
value pointed to by a pointer is sometimes called a pointee.
from memory.unsafe_pointer import UnsafePointer, initialize_pointee_copy, initialize_pointee_move
# Allocate memory to hold a value
var ptr = UnsafePointer[Int].alloc(1)
# Initialize the allocated memory
initialize_pointee_copy(ptr, 100)
Accessing the memory—to retrieve or update a value—is called dereferencing the pointer. You can dereference a pointer by following the variable name with an empty pair of square brackets:
# Update an initialized value
ptr[] += 10
# Access an initialized value
print(ptr[])
You can also allocate memory to hold multiple values to build array-like structures. For details, see Storing multiple values.
Lifecycle of a pointer
At any given time, a pointer can be in one of several states:
-
Uninitialized. Just like any variable, a variable of type
UnsafePointer
can be declared but uninitialized.var ptr: UnsafePointer[Int]
-
Null. A null pointer has an address of 0, indicating an invalid pointer.
ptr = UnsafePointer[Int]()
-
Pointing to allocated, uninitialized memory. The
alloc()
static method returns a pointer to a newly-allocated block of memory with space for the specified number of elements of the pointee's type.ptr = UnsafePointer[Int].alloc(1)
Trying to dereference a pointer to uninitialized memory results in undefined behavior.
-
Pointing to initialized memory. You can initialize an allocated, uninitialized pointer by moving or copying an existing value into the memory. Or you can use the
address_of()
static method to get a pointer to an existing value.initialize_pointee_copy(ptr, value)
# or
initalize_pointee_move(ptr, value^)
# or
ptr = UnsafePointer[Int].address_of(value)Once the value is initialized, you can read or mutate it using the dereference syntax:
oldValue = ptr[]
ptr[] = newValue -
Dangling. When you free the pointer's allocated memory, you're left with a dangling pointer. The address still points to its previous location, but the memory is no longer allocated to this pointer. Trying to dereference the pointer, or calling any method that would access the memory location results in undefined behavior.
ptr.free()
The following diagram shows the lifecycle of an UnsafePointer
:
UnsafePointer
Allocating memory
Use the static alloc()
method to allocate memory. The method returns a new
pointer pointing to the requested memory. You can allocate space for one or
more values of the pointee's type.
ptr = UnsafePointer[Int].alloc(10) # Allocate space for 10 Int values
The allocated space is uninitialized—like a variable that's been declared but not initialized.
Initializing the pointee
The unsafe_pointer
module includes a number of free functions for working with
the UnsafePointer
type. To initialize allocated memory, you can use the
initialize_pointee_copy()
or initialize_pointee_move()
functions:
initialize_pointee_copy(ptr, 5)
To move a value into the pointer's memory location, use
initialize_pointee_move()
:
initialize_pointee_move(str_ptr, my_string^)
Note that to move the value, you usually need to add the transfer operator
(^
), unless the value is a trivial
type (like
Int
) or a newly-constructed, "owned" value:
initialize_pointee_move(str_ptr, str("Owned string"))
Alternately, you can get a pointer to an existing value using the static
address_of()
method. This is useful for getting a pointer to a value on the
stack, for example.
var counter: Int = 5
ptr = UnsafePointer[Int].address_of(counter)
Note that when calling address_of()
, you don't need to allocate memory ahead
of time, since you're pointing to an existing value.
Initializing from an address
When exchanging data with other programming languages, you may need to construct
an UnsafePointer
from an address. For example, if you're working with a
pointer allocated by a C or C++ library, or a Python object that implements the
array interface protocol,
you can construct an UnsafePointer
to access the data from the Mojo side.
You can construct an UnsafePointer
from an integer address using the address
keyword argument. For example, the following code creates a NumPy array and then
accesses the data using a Mojo pointer:
from python import Python
from memory.unsafe_pointer import UnsafePointer
def share_array():
np = Python.import_module("numpy")
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
addr = int(arr.__array_interface__["data"][0])
ptr = UnsafePointer[Int64](address=addr)
for i in range(9):
print(ptr[i], end=", ")
share_array()
When dealing with memory allocated elsewhere, you need to be aware of who's responsible for freeing the memory. Freeing memory allocated elsewhere can result in undefined behavior.
You also need to be aware of the format of the data stored in memory, including data types and byte order. For more information, see Converting data: bitcasting and byte order.
Dereferencing pointers
Use the []
dereference operator to access the value stored at a pointer (the
"pointee").
# Read from pointee
print(ptr[])
# mutate pointee
ptr[] = 0
If you've allocated space for multiple values, you can use subscript syntax to
access the values, as if they were an array, like ptr[3]
. The empty subscript
[]
has the same meaning as [0]
.
The dereference operator assumes that the memory being dereferenced is initialized. Dereferencing uninitialized memory results in undefined behavior.
You cannot safely use the dereference operator on uninitialized memory, even to initialize a pointee. This is because assigning to a dereferenced pointer calls lifecycle methods on the existing pointee (such as the destructor, move constructor or copy constructor).
str_ptr = UnsafePointer[String].alloc(1)
# str_ptr[] = "Testing" # Undefined behavior!
initialize_pointee_move(str_ptr, "Testing")
str_ptr[] += " pointers" # Works now
Destroying or removing values
The
move_from_pointee(ptr)
function moves the pointee from the memory location pointed to by ptr
. This is
a consuming move—it invokes __moveinit__()
on the destination value. It leaves
the memory location uninitialized.
The destroy_pointee(ptr)
function calls the destructor on the pointee, and leaves the memory location
pointed to by ptr
uninitialized.
Both move_from_pointee()
and destroy_pointee()
require that the pointer is
non-null, and the memory location contains a valid, initialized value of the
pointee's type; otherwise the function results in undefined behavior.
The move_pointee(src, dst)
function moves the pointee from one pointer location to another. Both pointers
must be non-null. The source location must contain a valid, initialized value of
the pointee's type, and is left uninitialized after the call. The destination
location is assumed to be uninitialized—if it contains a valid value, that
value's destructor is not run. The value from the source location is moved to
the destination location as a consuming move. This function also has undefined
behavior if any of its prerequisites is not met.
Freeing memory
Calling free()
on a
pointer frees the memory allocated by the pointer. It doesn't call the
destructors on any values stored in the memory—you need to do that explicitly
(for example, using
destroy_pointee()
or
one of the other functions described in
Destroying or removing values).
Disposing of a pointer without freeing the associated memory can result in a memory leak—where your program keeps taking more and more memory, because not all allocated memory is being freed.
On the other hand, if you have multiple copies of a pointer accessing the same
memory, you need to make sure you only call free()
on one of them. Freeing the
same memory twice is also an error.
After freeing a pointer's memory, you're left with a dangling pointer—its address still points to the freed memory. Any attempt to access the memory, like dereferencing the pointer results in undefined behavior.
Storing multiple values
As mentioned in Allocating memory, you can use an
UnsafePointer
to allocate memory for multiple values. The memory is allocated
as a single, contiguous block. Pointers support arithmetic: adding an integer
to a pointer returns a new pointer offset by the specified number of values from
the original pointer:
third_ptr = first_ptr + 2
Pointers also support subtraction, as well as in-place addition and subtraction:
# Advance the pointer one element:
ptr += 1
For example, the following example allocates memory to store 6 Float64
values, and initializes them all to zero.
float_ptr = UnsafePointer[Float64].alloc(6)
for offset in range(6):
initialize_pointee_copy(float_ptr+offset, 0.0)
Once the values are initialized, you can access them using subscript syntax:
float_ptr[2] = 3.0
for offset in range(6):
print(float_ptr[offset], end=", ")
Converting data: bitcasting and byte order
Bitcasting a pointer returns a new pointer that has the same memory location, but a new data type. This can be useful if you need to access different types of data from a single area of memory. This can happen when you're reading binary files, like image files, or receiving data over the network.
The following sample processes a format that consists of chunks of data, where each chunk contains a variable number of 32-bit integers. Each chunk begins with an 8-bit integer that identifies the number of values in the chunk.
def read_chunks(owned ptr: UnsafePointer[UInt8]) -> List[List[UInt32]]:
chunks = List[List[UInt32]]()
# A chunk size of 0 indicates the end of the data
chunk_size = int(ptr[])
while (chunk_size > 0):
# Skip the 1 byte chunk_size and get a pointer to the first
# UInt32 in the chunk
ui32_ptr = (ptr + 1).bitcast[UInt32]()
chunk = List[UInt32](capacity=chunk_size)
for i in range(chunk_size):
chunk.append(ui32_ptr[i])
chunks.append(chunk)
# Move our pointer to the next byte after the current chunk
ptr += (1 + 4 * chunk_size)
# Read the size of the next chunk
chunk_size = int(ptr[])
return chunks
When dealing with data read in from a file or from the network, you may also need to deal with byte order. Most systems use little-endian byte order (also called least-signficicant byte, or LSB) where the least-significant byte in a multibyte value comes first. For example, the number 1001 can be represented in hexadecimal as 0x03E9, where E9 is the least-significant byte. Represented as a 16-bit little-endian integer, the two bytes are ordered E9 03. As a 32-bit integer, it would be represented as E9 03 00 00.
Big-endian or most-significant byte (MSB) ordering is the opposite: in the
32-bit case, 00 00 03 E9. MSB ordering is frequently used in file formats and
when transmitting data over the network. You can use the
byte_swap()
function to swap the byte
order of a SIMD value from big-endian to little-endian or the reverse. For
example, if the method above was reading big-endian data, you'd just need to
change a single line:
chunk.append(byte_swap(ui32_ptr[i]))
DTypePointer
: handling numeric data
A DTypePointer
is an unsafe
pointer that supports some additional methods for loading and storing numeric
data. Like the SIMD
type, it's parameterized
on DType
as described in
SIMD and DType.
DTypePointer
has a similar API to UnsafePointer
:
- You can
alloc()
andfree()
memory, or useaddress_of()
to point to an existing value. - The pointer supports pointer arithmetic to access adjacent memory locations.
- You can dereference a
DTypePointer
using subscript notation. - You can construct a
DTypePointer
from anInt
address.
You can also construct a DTypePointer
from an UnsafePointer
of a scalar
type like Int64
or Float32
:
from memory import DTypePointer, UnsafePointer
uptr = UnsafePointer[Float64].alloc(10)
dptr = DTypePointer(uptr)
# Or:
dptr = DTypePointer[DType.float64].alloc(10)
Unlike UnsafePointer
, DTypePointer
doesn't have special methods to
initialize values, destroy them, or move them out. Because all of the values
that DTypePointer
works with are trivial types, DTypePointer
doesn't need to
destroy values before overwriting them or freeing memory. Instead, you can use
subscript notation (like UnsafePointer
) or use the
load()
and
store()
methods to access
values.
What DTypePointer
adds is various methods of loading and storing SIMD values
to memory. In particular: strided load/store and gather/scatter.
Strided load loads values from memory into a SIMD vector using an offset (the "stride") between successive memory addresses. This can be useful for extracting rows or columns from tabular data, or for extracting individual values from structured data. For example, consider the data for an RGB image, where each pixel is made up of three 8-bit values, for red, green, and blue. If you want to access just the red values, you can use a strided load or store.
The following function uses the
simd_strided_load()
and
simd_strided_store()
methods to invert the red pixel values in an image, 8 values at a time. (Note
that this function only handles images where the number of pixels is evenly
divisible by eight.)
def invert_red_channel(ptr: DTypePointer[DType.uint8], pixel_count: Int):
# number of values loaded or stored at a time
alias simd_width = 8
# bytes per pixel, which is also the stride size
bpp = 3
for i in range(0, pixel_count * bpp, simd_width * bpp):
red_values = ptr.offset(i).simd_strided_load[width=simd_width](bpp)
# Invert values and store them in their original locations
ptr.offset(i).simd_strided_store[width=simd_width](~red_values, bpp)
DTypePointer
The DTypePointer
type exists for historical reasons, but it no longer really
needs to be a separate type. UnsafePointer
can handle most things that
DTypePointer
does except for a few features related to reading and writing
SIMD
values. At some point in the future, these features will probably be
integrated into the SIMD
type, so you can use them with UnsafePointer
.
Safety
Unsafe pointers are unsafe for several reasons:
-
Memory management is up to the user. You need to manually allocate and free memory, and be aware of when other APIs are allocating or freeing memory for you.
-
UnsafePointer
andDTypePointer
values are nullable—that is, the pointer is not guaranteed to point to anything. And even when a pointer points to allocated memory, that memory may not be initialized. -
Mojo doesn't track lifetimes for the data pointed to by an
UnsafePointer
. When you use anUnsafePointer
, managing memory and knowing when to destroy objects is your responsibility. (SinceDTypePointer
only works with trivial types, this is not typically an issue forDTypePointer
.)
UnsafePointer
and Reference
The Reference
type is essentially a
safe pointer type. Like a pointer, you can derferences a Reference
using the
dereference operator, []
. However, the Reference
type has several
differences from UnsafePointer
which make it safer:
- A
Reference
is non-nullable. A reference always points to something. - You can't allocate or free memory using a
Reference
—only point to an existing value. - A
Reference
only refers to a single value. You can't do pointer arithmetic with aReference
. - A
Reference
has an associated lifetime, which connects it back to an original, owned value. The lifetime ensures that the value won't be destroyed while the reference exists.
The Reference
type shouldn't be confused with the immutable and mutable
references used with the borrowed
and inout
argument conventions. Those
references do not require explicit dereferencing, unlike a Reference
or
UnsafePointer
.