Skip to main content

Unsafe pointers

The UnsafePointer type creates an indirect reference to a location in memory. You can use an UnsafePointer to dynamically allocate and free memory, or to point to memory allocated by some other piece of code. You can use these pointers to write code that interacts with low-level interfaces, to interface with other programming languages, or to build certain kinds of data structures. But as the name suggests, they're inherently unsafe. For example, when using unsafe pointers, you're responsible for ensuring that memory gets allocated and freed correctly.

note

In addition to unsafe pointers, Mojo supports a safe Reference type. See UnsafePointer and Reference for a brief comparison of the types.

What is a pointer?

An UnsafePointer is a type that holds an address to memory. You can store and retrieve values in that memory. The UnsafePointer type is generic—it can point to any type of value, and the value type is specified as a parameter. The value pointed to by a pointer is sometimes called a pointee.

from memory.unsafe_pointer import UnsafePointer, initialize_pointee_copy, initialize_pointee_move

# Allocate memory to hold a value
var ptr = UnsafePointer[Int].alloc(1)
# Initialize the allocated memory
initialize_pointee_copy(ptr, 100)

Figure 1. Pointer and pointee

Accessing the memory—to retrieve or update a value—is called dereferencing the pointer. You can dereference a pointer by following the variable name with an empty pair of square brackets:

# Update an initialized value
ptr[] += 10
# Access an initialized value
print(ptr[])
110

You can also allocate memory to hold multiple values to build array-like structures. For details, see Storing multiple values.

Lifecycle of a pointer

At any given time, a pointer can be in one of several states:

  • Uninitialized. Just like any variable, a variable of type UnsafePointer can be declared but uninitialized.

    var ptr: UnsafePointer[Int]
  • Null. A null pointer has an address of 0, indicating an invalid pointer.

    ptr = UnsafePointer[Int]()
  • Pointing to allocated, uninitialized memory. The alloc() static method returns a pointer to a newly-allocated block of memory with space for the specified number of elements of the pointee's type.

    ptr = UnsafePointer[Int].alloc(1)

    Trying to dereference a pointer to uninitialized memory results in undefined behavior.

  • Pointing to initialized memory. You can initialize an allocated, uninitialized pointer by moving or copying an existing value into the memory. Or you can use the address_of() static method to get a pointer to an existing value.

    initialize_pointee_copy(ptr, value)
    # or
    initalize_pointee_move(ptr, value^)
    # or
    ptr = UnsafePointer[Int].address_of(value)

    Once the value is initialized, you can read or mutate it using the dereference syntax:

    oldValue = ptr[]
    ptr[] = newValue
  • Dangling. When you free the pointer's allocated memory, you're left with a dangling pointer. The address still points to its previous location, but the memory is no longer allocated to this pointer. Trying to dereference the pointer, or calling any method that would access the memory location results in undefined behavior.

    ptr.free()

The following diagram shows the lifecycle of an UnsafePointer:

Figure 2. Lifecycle of an UnsafePointer

Allocating memory

Use the static alloc() method to allocate memory. The method returns a new pointer pointing to the requested memory. You can allocate space for one or more values of the pointee's type.

ptr = UnsafePointer[Int].alloc(10) # Allocate space for 10 Int values

The allocated space is uninitialized—like a variable that's been declared but not initialized.

Initializing the pointee

The unsafe_pointer module includes a number of free functions for working with the UnsafePointer type. To initialize allocated memory, you can use the initialize_pointee_copy() or initialize_pointee_move() functions:

initialize_pointee_copy(ptr, 5)

To move a value into the pointer's memory location, use initialize_pointee_move():

initialize_pointee_move(str_ptr, my_string^)

Note that to move the value, you usually need to add the transfer operator (^), unless the value is a trivial type (like Int) or a newly-constructed, "owned" value:

initialize_pointee_move(str_ptr, str("Owned string"))

Alternately, you can get a pointer to an existing value using the static address_of() method. This is useful for getting a pointer to a value on the stack, for example.

var counter: Int = 5
ptr = UnsafePointer[Int].address_of(counter)

Note that when calling address_of(), you don't need to allocate memory ahead of time, since you're pointing to an existing value.

Initializing from an address

When exchanging data with other programming languages, you may need to construct an UnsafePointer from an address. For example, if you're working with a pointer allocated by a C or C++ library, or a Python object that implements the array interface protocol, you can construct an UnsafePointer to access the data from the Mojo side.

You can construct an UnsafePointer from an integer address using the address keyword argument. For example, the following code creates a NumPy array and then accesses the data using a Mojo pointer:

from python import Python
from memory.unsafe_pointer import UnsafePointer

def share_array():
np = Python.import_module("numpy")
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
addr = int(arr.__array_interface__["data"][0])
ptr = UnsafePointer[Int64](address=addr)
for i in range(9):
print(ptr[i], end=", ")

share_array()
1, 2, 3, 4, 5, 6, 7, 8, 9,

When dealing with memory allocated elsewhere, you need to be aware of who's responsible for freeing the memory. Freeing memory allocated elsewhere can result in undefined behavior.

You also need to be aware of the format of the data stored in memory, including data types and byte order. For more information, see Converting data: bitcasting and byte order.

Dereferencing pointers

Use the [] dereference operator to access the value stored at a pointer (the "pointee").

# Read from pointee
print(ptr[])
# mutate pointee
ptr[] = 0

110

If you've allocated space for multiple values, you can use subscript syntax to access the values, as if they were an array, like ptr[3]. The empty subscript [] has the same meaning as [0].

caution

The dereference operator assumes that the memory being dereferenced is initialized. Dereferencing uninitialized memory results in undefined behavior.

You cannot safely use the dereference operator on uninitialized memory, even to initialize a pointee. This is because assigning to a dereferenced pointer calls lifecycle methods on the existing pointee (such as the destructor, move constructor or copy constructor).

str_ptr = UnsafePointer[String].alloc(1)
# str_ptr[] = "Testing" # Undefined behavior!
initialize_pointee_move(str_ptr, "Testing")
str_ptr[] += " pointers" # Works now

Destroying or removing values

The move_from_pointee(ptr) function moves the pointee from the memory location pointed to by ptr. This is a consuming move—it invokes __moveinit__() on the destination value. It leaves the memory location uninitialized.

The destroy_pointee(ptr) function calls the destructor on the pointee, and leaves the memory location pointed to by ptr uninitialized.

Both move_from_pointee() and destroy_pointee() require that the pointer is non-null, and the memory location contains a valid, initialized value of the pointee's type; otherwise the function results in undefined behavior.

The move_pointee(src, dst) function moves the pointee from one pointer location to another. Both pointers must be non-null. The source location must contain a valid, initialized value of the pointee's type, and is left uninitialized after the call. The destination location is assumed to be uninitialized—if it contains a valid value, that value's destructor is not run. The value from the source location is moved to the destination location as a consuming move. This function also has undefined behavior if any of its prerequisites is not met.

Freeing memory

Calling free() on a pointer frees the memory allocated by the pointer. It doesn't call the destructors on any values stored in the memory—you need to do that explicitly (for example, using destroy_pointee() or one of the other functions described in Destroying or removing values).

Disposing of a pointer without freeing the associated memory can result in a memory leak—where your program keeps taking more and more memory, because not all allocated memory is being freed.

On the other hand, if you have multiple copies of a pointer accessing the same memory, you need to make sure you only call free() on one of them. Freeing the same memory twice is also an error.

After freeing a pointer's memory, you're left with a dangling pointer—its address still points to the freed memory. Any attempt to access the memory, like dereferencing the pointer results in undefined behavior.

Storing multiple values

As mentioned in Allocating memory, you can use an UnsafePointer to allocate memory for multiple values. The memory is allocated in as single, contiguous block. Pointers support arithmetic: adding an integer to a pointer returns a new pointer offset by the specified number of values from the original pointer:

third_ptr = first_ptr + 2

Pointers also support subtraction, as well as in-place addition and subtraction:

# Advance the pointer one element:
ptr += 1

Figure 3. Pointer arithmetic

For example, the following example allocates memory to store 6 Float64 values, and initializes them all to zero.

float_ptr = UnsafePointer[Float64].alloc(6)
for offset in range(6):
initialize_pointee_copy(float_ptr+offset, 0.0)

Once the values are initialized, you can access them using subscript syntax:

float_ptr[2] = 3.0
for offset in range(6):
print(float_ptr[offset], end=", ")
0.0, 0.0, 3.0, 0.0, 0.0, 0.0,

Converting data: bitcasting and byte order

Bitcasting a pointer returns a new pointer that has the same memory location, but a new data type. This can be useful if you need to access different types of data from a single area of memory. This can happen when you're reading binary files, like image files, or receiving data over the network.

The following sample processes a format that consists of chunks of data, where each chunk contains a variable number of 32-bit integers. Each chunk begins with an 8-bit integer that identifies the number of values in the chunk.


def read_chunks(owned ptr: UnsafePointer[UInt8]) -> List[List[UInt32]]:
chunks = List[List[UInt32]]()
# A chunk size of 0 indicates the end of the data
chunk_size = int(ptr[])
while (chunk_size > 0):
# Skip the 1 byte chunk_size and get a pointer to the first
# UInt32 in the chunk
ui32_ptr = (ptr + 1).bitcast[UInt32]()
chunk = List[UInt32](capacity=chunk_size)
for i in range(chunk_size):
chunk.append(ui32_ptr[i])
chunks.append(chunk)
# Move our pointer to the next byte after the current chunk
ptr += (1 + 4 * chunk_size)
# Read the size of the next chunk
chunk_size = int(ptr[])
return chunks

When dealing with data read in from a file or from the network, you may also need to deal with byte order. Most systems use little-endian byte order (also called least-signficicant byte, or LSB) where the least-significant byte in a multibyte value comes first. For example, the number 1001 can be represented in hexadecimal as 0x03E9, where E9 is the least-significant byte. Represented as a 16-bit little-endian integer, the two bytes are ordered E9 03. As a 32-bit integer, it would be represented as E9 03 00 00.

Big-endian or most-significant byte (MSB) ordering is the opposite: in the 32-bit case, 00 00 03 E9. MSB ordering is frequently used in file formats and when transmitting data over the network. You can use the byte_swap() function to swap the byte order of a SIMD value from big-endian to little-endian or the reverse. For example, if the method above was reading big-endian data, you'd just need to change a single line:

chunk.append(byte_swap(ui32_ptr[i]))

DTypePointer: handling numeric data

The DTypePointer is an unsafe pointer that supports some additional methods for loading and storing numeric data. Like the SIMD type, it's parameterized on DType as described in SIMD and DType.

DTypePointer has a similar API to UnsafePointer:

  • You can alloc() and free() memory, or use address_of() to point to an existing value.
  • The pointer supports pointer arithmetic to access adjacent memory locations.
  • You can dereference a DTypePointer using subscript notation.
  • You can construct a DTypePointer from an Int address.

You can also construct a DTypePointer from an UnsafePointer of a scalar type like Int64 or Float32:

from memory import DTypePointer, UnsafePointer

uptr = UnsafePointer[Float64].alloc(10)
dptr = DTypePointer(uptr)
# Or:
dptr = DTypePointer[DType.float64].alloc(10)

Unlike UnsafePointer, DTypePointer doesn't have special methods to initialize values, destroy them, or move them out. Because all of the values that DTypePointer works with are trivial types, DTypePointer doesn't need to destroy values before overwriting them or freeing memory. Instead, you can use subscript notation (like UnsafePointer) or use the load() and store() methods to access values.

What DTypePointer adds is various methods of loading and storing SIMD values to memory. In particular: strided load/store and gather/scatter.

Strided load loads values from memory into a SIMD vector using an offset (the "stride") between successive memory addresses. This can be useful for extracting rows or columns from tabular data, or for extracting individual values from structured data. For example, consider the data for an RGB image, where each pixel is made up of three 8-bit values, for red, green, and blue. If you want to access just the red values, you can use a strided load or store.

Figure 4. Strided load

The following function uses the simd_strided_load() and simd_strided_store() methods to invert the red pixel values in an image, 8 values at a time. (Note that this function only handles images with where the number of pixels is evenly divisible by eight.)

def invert_red_channel(ptr: DTypePointer[DType.uint8], pixel_count: Int):
# number of values loaded or stored at a time
alias simd_width = 8
# bytes per pixel, which is also the stride size
bpp = 3
for i in range(0, pixel_count * bpp, simd_width * bpp):
red_values = ptr.offset(i).simd_strided_load[width=simd_width](bpp)
# Invert values and store them in their original locations
ptr.offset(i).simd_strided_store[width=simd_width](~red_values, bpp)
Future of DTypePointer

The DTypePointer type exists for historical reasons, but it no longer really needs to be a separate type. UnsafePointer can handle most things that DTypePointer does except for a few features related to reading and writing SIMD values. At some point in the future, these features will probably be integrated into the SIMD type, so you can use them with UnsafePointer.

Safety

Unsafe pointers are unsafe for several reasons:

  • Memory management is up to the user. You need to manually allocate and free memory, and be aware of when other APIs are allocating or freeing memory for you.

  • UnsafePointer and DTypePointer values are nullable—that is, the pointer is not guaranteed to point to anything. And even when a pointer points to allocated memory, that memory may not be initialized.

  • Mojo doesn't track lifetimes for the data pointed to by an UnsafePointer. When you use an UnsafePointer, managing memory and knowing when to destroy objects is your responsibility. (Since DTypePointer only works with trivial types, this is not typically an issue for DTypePointer.)

UnsafePointer and Reference

The Reference type is essentially a safe pointer type. Like a pointer, you can derferences a Reference using the dereference operator, []. However, the Reference type has several differences from UnsafePointer which make it safer:

  • A Reference is non-nullable. A reference always points to something.
  • You can't allocate or free memory using a Reference—only point to an existing value.
  • A Reference only refers to a single value. You can't do pointer arithmetic with a Reference.
  • A Reference has an associated lifetime, which connects it back to an original, owned value. The lifetime ensures that the value won't be destroyed while the reference exists.

The Reference type shouldn't be confused with the immutable and mutable references used with the borrowed and inout argument conventions. Those references do not require explicit dereferencing, unlike a Reference or UnsafePointer.