Skip to main content

function

hash

hash[T: Hashable](hashable: T) -> Int

Hash a Hashable type using its underlying hash implementation.

Parameters:

  • T (Hashable): Any Hashable type.

Args:

  • hashable (T): The input data to hash.

Returns:

A 64-bit integer hash based on the underlying implementation.

hash(bytes: DTypePointer[uint8, 0], n: Int) -> Int

Hash a byte array using a SIMD-modified DJBX33A hash algorithm.

Similar to hash(bytes: DTypePointer[DType.int8], n: Int) -> Int but takes a DTypePointer[DType.uint8] instead of DTypePointer[DType.int8]. See the overload for a complete description of the algorithm.

Args:

  • bytes (DTypePointer[uint8, 0]): The byte array to hash.
  • n (Int): The length of the byte array.

Returns:

A 64-bit integer hash. This hash is not suitable for cryptographic purposes, but will have good low-bit hash collision statistical properties for common data structures.

hash(bytes: DTypePointer[int8, 0], n: Int) -> Int

Hash a byte array using a SIMD-modified hash algorithm.

This hash function is not suitable for cryptographic purposes. The algorithm is easy to reverse and produce deliberate hash collisions. The hash function is designed to have relatively good mixing and statistical properties for use in hash-based data structures. We do however initialize a random hash secret which is mixed into the final hash output. This can help prevent DDOS attacks on applications which make use of this function for dictionary hashing. As a consequence, hash values are deterministic within an individual runtime instance ie. a value will always hash to the same thing, but in between runs this value will change based on the hash secret.

We take advantage of Mojo's first-class SIMD support to create a SIMD-vectorized hash function, using some simple hash algorithm as a base.

  • Interpret those bytes as a SIMD vector, padded with zeros to align to the system SIMD width.
  • Apply the simple hash function parallelized across SIMD vectors.
  • Hash the final SIMD vector state to reduce to a single value.

Python uses DJBX33A with a hash secret for smaller strings, and then the SipHash algorithm for longer strings. The arguments and tradeoffs are well documented in PEP 456. We should consider this and deeper performance/security tradeoffs as Mojo evolves.

References:

from random import rand
var n = 64
var rand_bytes = DTypePointer[DType.int8].alloc(n)
rand(rand_bytes, n)
hash(rand_bytes, n)

Args:

  • bytes (DTypePointer[int8, 0]): The byte array to hash.
  • n (Int): The length of the byte array.

Returns:

A 64-bit integer hash. This hash is not suitable for cryptographic purposes, but will have good low-bit hash collision statistical properties for common data structures.