Skip to main content

function

hash

hash[T: Hashable](hashable: T) -> Int

Hash a Hashable type using its underlying hash implementation.

Parameters:

  • T (Hashable): Any Hashable type.

Args:

  • hashable (T): The input data to hash.

Returns:

A 64-bit integer hash based on the underlying implementation.

hash(bytes: DTypePointer[int8, 0], n: Int) -> Int

Hash a byte array using a SIMD-modified DJBX33A hash algorithm.

The DJBX33A algorithm is commonly used for data structures that rely on well-distributed hashing for performance. The low order bits of the result depend on each byte in the input, meaning that single-byte changes will result in a changed hash even when masking out most bits eg. for small dictionaries.

This hash function is not suitable for cryptographic purposes. The algorithm is easy to reverse and produce deliberate hash collisions. We do however initialize a random hash secret which is mixed into the final hash output. This can help prevent DDOS attacks on applications which make use of this function for dictionary hashing. As a consequence, hash values are deterministic within an individual runtime instance ie. a value will always hash to the same thing, but in between runs this value will change based on the hash secret.

Standard DJBX33A is:

  • Set hash = 5361
  • For each byte: hash = 33 * hash + byte

Instead, for all bytes except trailing bytes that don't align to the max SIMD vector width, we:

  • Interpret those bytes as a SIMD vector.
  • Apply a vectorized hash: v = 33 * v + bytes_as_simd_value
  • Call reduce_add() on the final result to get a single hash value.
  • Use this value in fallback for the remaining suffix bytes with standard DJBX33A.

Python uses DJBX33A with a hash secret for smaller strings, and then the SipHash algorithm for longer strings. The arguments and tradeoffs are well documented in PEP 456. We should consider this and deeper performance/security tradeoffs as Mojo evolves.

References:

from random import rand
var n = 64
var rand_bytes = DTypePointer[DType.int8].alloc(n)
rand(rand_bytes, n)
hash(rand_bytes, n)

Args:

  • bytes (DTypePointer[int8, 0]): The byte array to hash.
  • n (Int): The length of the byte array.

Returns:

A 64-bit integer hash. This hash is not suitable for cryptographic purposes, but will have good low-bit hash collision statistical properties for common data structures.