Skip to main content

struct

NDBuffer

An N-dimensional Buffer.

NDBuffer can be parametrized on rank, static dimensions and Dtype. It does not own its underlying pointer.

Parameters

  • type (DType): The element type of the buffer.
  • rank (Int): The rank of the buffer.
  • shape (DimList): The static size (if known) of the buffer.
  • address_space (AddressSpace): The address space of the buffer.

Fields

  • data (DTypePointer[type, address_space]): The underlying data for the buffer. The pointer is not owned by the NDBuffer.
  • dynamic_shape (StaticIntTuple[rank]): The dynamic value of the shape.
  • dynamic_stride (StaticIntTuple[rank]): The dynamic stride of the buffer.
  • is_contiguous (Bool): True if the contents of the buffer are contiguous in memory.

Implemented traits

AnyType, Copyable, Movable, Sized, Stringable

Methods

__init__

__init__(inout self: Self, /)

Default initializer for NDBuffer. By default the fields are all initialized to 0.

__init__(inout self: Self, /, ptr: LegacyPointer[SIMD[type, 1], address_space])

Constructs an NDBuffer with statically known rank, shapes and type.

Constraints:

The rank, shapes, and type are known.

Args:

  • ptr (LegacyPointer[SIMD[type, 1], address_space]): Pointer to the data.

__init__(inout self: Self, /, ptr: DTypePointer[type, address_space])

Constructs an NDBuffer with statically known rank, shapes and type.

Constraints:

The rank, shapes, and type are known.

Args:

  • ptr (DTypePointer[type, address_space]): Pointer to the data.

__init__(inout self: Self, /, ptr: LegacyPointer[scalar<#lit.struct.extract<:_stdlib::_builtin::_dtype::_DType type, "value">>, address_space], dynamic_shape: StaticIntTuple[rank])

Constructs an NDBuffer with statically known rank, but dynamic shapes and type.

Constraints:

The rank is known.

Args:

  • ptr (LegacyPointer[scalar<#lit.struct.extract<:_stdlib::_builtin::_dtype::_DType type, "value">>, address_space]): Pointer to the data.
  • dynamic_shape (StaticIntTuple[rank]): A static tuple of size 'rank' representing shapes.

__init__(inout self: Self, /, ptr: DTypePointer[type, address_space], dynamic_shape: StaticIntTuple[rank])

Constructs an NDBuffer with statically known rank, but dynamic shapes and type.

Constraints:

The rank is known.

Args:

  • ptr (DTypePointer[type, address_space]): Pointer to the data.
  • dynamic_shape (StaticIntTuple[rank]): A static tuple of size 'rank' representing shapes.

__init__(inout self: Self, /, ptr: DTypePointer[type, address_space], dynamic_shape: DimList)

Constructs an NDBuffer with statically known rank, but dynamic shapes and type.

Constraints:

The rank is known.

Args:

  • ptr (DTypePointer[type, address_space]): Pointer to the data.
  • dynamic_shape (DimList): A static tuple of size 'rank' representing shapes.

__init__(inout self: Self, /, ptr: LegacyPointer[SIMD[type, 1], address_space], dynamic_shape: StaticIntTuple[rank], dynamic_stride: StaticIntTuple[rank])

Constructs a strided NDBuffer with statically known rank, but dynamic shapes and type.

Constraints:

The rank is known.

Args:

  • ptr (LegacyPointer[SIMD[type, 1], address_space]): Pointer to the data.
  • dynamic_shape (StaticIntTuple[rank]): A static tuple of size 'rank' representing shapes.
  • dynamic_stride (StaticIntTuple[rank]): A static tuple of size 'rank' representing strides.

__init__(inout self: Self, /, ptr: LegacyPointer[SIMD[type, 1], address_space], dynamic_shape: DimList, dynamic_stride: StaticIntTuple[rank])

Constructs a strided NDBuffer with statically known rank, but dynamic shapes and type.

Constraints:

The rank is known.

Args:

  • ptr (LegacyPointer[SIMD[type, 1], address_space]): Pointer to the data.
  • dynamic_shape (DimList): A DimList of size 'rank' representing shapes.
  • dynamic_stride (StaticIntTuple[rank]): A static tuple of size 'rank' representing strides.

__init__(inout self: Self, /, ptr: DTypePointer[type, address_space], dynamic_shape: StaticIntTuple[rank], dynamic_stride: StaticIntTuple[rank])

Constructs a strided NDBuffer with statically known rank, but dynamic shapes and type.

Constraints:

The rank is known.

Args:

  • ptr (DTypePointer[type, address_space]): Pointer to the data.
  • dynamic_shape (StaticIntTuple[rank]): A static tuple of size 'rank' representing shapes.
  • dynamic_stride (StaticIntTuple[rank]): A static tuple of size 'rank' representing strides.

__init__(inout self: Self, /, ptr: DTypePointer[type, address_space], dynamic_shape: DimList, dynamic_stride: StaticIntTuple[rank])

Constructs a strided NDBuffer with statically known rank, but dynamic shapes and type.

Constraints:

The rank is known.

Args:

  • ptr (DTypePointer[type, address_space]): Pointer to the data.
  • dynamic_shape (DimList): A DimList of size 'rank' representing shapes.
  • dynamic_stride (StaticIntTuple[rank]): A static tuple of size 'rank' representing strides.

__getitem__

__getitem__(self: Self, *idx: Int) -> SIMD[type, 1]

Gets an element from the buffer from the specified index.

Args:

  • *idx (Int): Index of the element to retrieve.

Returns:

The value of the element.

__getitem__(self: Self, idx: StaticIntTuple[rank]) -> SIMD[type, 1]

Gets an element from the buffer from the specified index.

Args:

  • idx (StaticIntTuple[rank]): Index of the element to retrieve.

Returns:

The value of the element.

__setitem__

__setitem__(self: Self, idx: StaticIntTuple[rank], val: SIMD[type, 1])

Stores a single value into the buffer at the specified index.

Args:

  • idx (StaticIntTuple[rank]): The index into the Buffer.
  • val (SIMD[type, 1]): The value to store.

__add__

__add__(self: Self, rhs: NDBuffer[type, rank, shape, address_space]) -> Self

Adds a NDBuffer.

Args:

  • rhs (NDBuffer[type, rank, shape, address_space]): The RHS of the add operation.

Returns:

The addition result.

__sub__

__sub__(self: Self, rhs: Self) -> Self

Subtracts a scalar.

Args:

  • rhs (Self): The RHS of the sub operation.

Returns:

The subtraction result.

__sub__[rhs_shape: DimList](self: Self, rhs: NDBuffer[type, 1, rhs_shape, 0]) -> Self

Subtracts a NDBuffer.

Parameters:

  • rhs_shape (DimList): Shape of RHS.

Args:

  • rhs (NDBuffer[type, 1, rhs_shape, 0]): The RHS of the sub operation.

Returns:

The subtraction result.

__mul__

__mul__(self: Self, rhs: Self) -> Self

Multiplies a NDBuffer.

Args:

  • rhs (Self): The RHS of the mul operation.

Returns:

The division result.

__imul__

__imul__(inout self: Self, rhs: SIMD[float32, 1])

In-place multiplies a scalar.

Args:

  • rhs (SIMD[float32, 1]): The RHS of the mul operation.

__imul__(inout self: Self, rhs: NDBuffer[type, rank, shape, address_space])

In-place multiplies a NDBuffer.

Args:

  • rhs (NDBuffer[type, rank, shape, address_space]): The RHS of the mul operation.

__itruediv__

__itruediv__(inout self: Self, rhs: NDBuffer[type, rank, shape, address_space])

In-place divides a NDBuffer.

Args:

  • rhs (NDBuffer[type, rank, shape, address_space]): The RHS of the div operation.

get_rank

get_rank(self: Self) -> Int

Returns the rank of the buffer.

Returns:

The rank of NDBuffer.

get_shape

get_shape(self: Self) -> StaticIntTuple[rank]

Returns the shapes of the buffer.

Returns:

A static tuple of size 'rank' representing shapes of the NDBuffer.

get_nd_index

get_nd_index(self: Self, idx: Int) -> StaticIntTuple[rank]

Computes the NDBuffer's ND-index based on the flat index.

Args:

  • idx (Int): The flat index.

Returns:

The index positions.

__len__

__len__(self: Self) -> Int

Computes the NDBuffer's number of elements.

Returns:

The total number of elements in the NDBuffer.

num_elements

num_elements(self: Self) -> Int

Computes the NDBuffer's number of elements.

Returns:

The total number of elements in the NDBuffer.

size

size(self: Self) -> Int

Computes the NDBuffer's number of elements.

Returns:

The total number of elements in the NDBuffer.

__str__

__str__(self: Self) -> String

Gets the buffer as a string.

Returns:

A compact string of the buffer.

__repr__

__repr__(self: Self) -> String

Gets the buffer as a string.

Returns:

A compact string representation of the buffer.

tile

tile[*tile_sizes: Dim](self: Self, tile_coords: StaticIntTuple[rank]) -> NDBuffer[type, rank, $0, address_space]

Returns an n-d tile "slice" of the buffer of size tile_sizes at coords.

Parameters:

  • *tile_sizes (Dim): The size of the tiles.

Args:

  • tile_coords (StaticIntTuple[rank]): The tile index.

Returns:

The tiled buffer at tile_coords.

load

load[*, width: Int = 1, alignment: Int = alignof[stdlib::builtin::dtype::DType,__mlir_type.!kgen.target]() if triple_is_nvidia_cuda() else 1](self: Self, *idx: Int) -> SIMD[type, $0]

Loads a simd value from the buffer at the specified index.

Constraints:

The buffer must be contiguous or width must be 1.

Parameters:

  • width (Int): The simd_width of the load.
  • alignment (Int): The alignment value.

Args:

  • *idx (Int): The index into the NDBuffer.

Returns:

The simd value starting at the idx position and ending at idx+width.

load[*, width: Int = 1, alignment: Int = alignof[stdlib::builtin::dtype::DType,__mlir_type.!kgen.target]() if triple_is_nvidia_cuda() else 1](self: Self, idx: VariadicList[Int]) -> SIMD[type, $0]

Loads a simd value from the buffer at the specified index.

Constraints:

The buffer must be contiguous or width must be 1.

Parameters:

  • width (Int): The simd_width of the load.
  • alignment (Int): The alignment value.

Args:

  • idx (VariadicList[Int]): The index into the NDBuffer.

Returns:

The simd value starting at the idx position and ending at idx+width.

load[*, width: Int = 1, alignment: Int = alignof[stdlib::builtin::dtype::DType,__mlir_type.!kgen.target]() if triple_is_nvidia_cuda() else 1](self: Self, idx: StaticIntTuple[rank]) -> SIMD[type, $0]

Loads a simd value from the buffer at the specified index.

Constraints:

The buffer must be contiguous or width must be 1.

Parameters:

  • width (Int): The simd_width of the load.
  • alignment (Int): The alignment value.

Args:

  • idx (StaticIntTuple[rank]): The index into the NDBuffer.

Returns:

The simd value starting at the idx position and ending at idx+width.

load[*, width: Int = 1, alignment: Int = alignof[stdlib::builtin::dtype::DType,__mlir_type.!kgen.target]() if triple_is_nvidia_cuda() else 1](self: Self, idx: StaticTuple[Int, rank]) -> SIMD[type, $0]

Loads a simd value from the buffer at the specified index.

Constraints:

The buffer must be contiguous or width must be 1.

Parameters:

  • width (Int): The simd_width of the load.
  • alignment (Int): The alignment value.

Args:

  • idx (StaticTuple[Int, rank]): The index into the NDBuffer.

Returns:

The simd value starting at the idx position and ending at idx+width.

store

store[*, width: Int = 1, alignment: Int = alignof[stdlib::builtin::dtype::DType,__mlir_type.!kgen.target]() if triple_is_nvidia_cuda() else 1](self: Self, idx: StaticIntTuple[rank], val: SIMD[type, width])

Stores a simd value into the buffer at the specified index.

Constraints:

The buffer must be contiguous or width must be 1.

Parameters:

  • width (Int): The width of the simd vector.
  • alignment (Int): The alignment value.

Args:

  • idx (StaticIntTuple[rank]): The index into the Buffer.
  • val (SIMD[type, width]): The value to store.

store[*, width: Int = 1, alignment: Int = alignof[stdlib::builtin::dtype::DType,__mlir_type.!kgen.target]() if triple_is_nvidia_cuda() else 1](self: Self, idx: StaticTuple[Int, rank], val: SIMD[type, width])

Stores a simd value into the buffer at the specified index.

Constraints:

The buffer must be contiguous or width must be 1.

Parameters:

  • width (Int): The width of the simd vector.
  • alignment (Int): The alignment value.

Args:

  • idx (StaticTuple[Int, rank]): The index into the Buffer.
  • val (SIMD[type, width]): The value to store.

simd_nt_store

simd_nt_store[width: Int](self: Self, idx: StaticIntTuple[rank], val: SIMD[type, width])

Stores a simd value using non-temporal store.

Constraints:

The buffer must be contiguous. The address must be properly aligned, 64B for avx512, 32B for avx2, and 16B for avx.

Parameters:

  • width (Int): The width of the simd vector.

Args:

  • idx (StaticIntTuple[rank]): The index into the Buffer.
  • val (SIMD[type, width]): The value to store.

simd_nt_store[width: Int](self: Self, idx: StaticTuple[Int, rank], val: SIMD[type, width])

Stores a simd value using non-temporal store.

Constraints:

The buffer must be contiguous. The address must be properly aligned, 64B for avx512, 32B for avx2, and 16B for avx.

Parameters:

  • width (Int): The width of the simd vector.

Args:

  • idx (StaticTuple[Int, rank]): The index into the Buffer.
  • val (SIMD[type, width]): The value to store.

dim

dim[index: Int](self: Self) -> Int

Gets the buffer dimension at the given index.

Parameters:

  • index (Int): The number of dimension to get.

Returns:

The buffer size at the given dimension.

dim(self: Self, index: Int) -> Int

Gets the buffer dimension at the given index.

Args:

  • index (Int): The number of dimension to get.

Returns:

The buffer size at the given dimension.

stride

stride(self: Self, index: Int) -> Int

Gets the buffer stride at the given index.

Args:

  • index (Int): The number of dimension to get the stride for.

Returns:

The stride at the given dimension.

flatten

flatten(self: Self) -> Buffer[type, Dim(), address_space]

Constructs a flattened Buffer counterpart for this NDBuffer.

Constraints:

The buffer must be contiguous.

Returns:

Constructed Buffer object.

make_dims_unknown

make_dims_unknown(self: Self) -> NDBuffer[type, rank, create_unknown[stdlib::builtin::int::Int](), address_space]

Rebinds the NDBuffer to one with unknown shape.

Returns:

The rebound NDBuffer with unknown shape.

bytecount

bytecount(self: Self) -> Int

Returns the size of the NDBuffer in bytes.

Returns:

The size of the NDBuffer in bytes.

zero

zero(self: Self)

Sets all bytes of the NDBuffer to 0.

Constraints:

The buffer must be contiguous.

tofile

tofile(self: Self, path: Path)

Write values to a file.

Args:

  • path (Path): Path to the output file.

fill

fill(self: Self, val: SIMD[type, 1])

Assigns val to all elements in the Buffer.

The fill is performed in chunks of size N, where N is the native SIMD width of type on the system.

Args:

  • val (SIMD[type, 1]): The value to store.

aligned_stack_allocation

static aligned_stack_allocation[alignment: Int]() -> Self

Constructs an NDBuffer instance backed by stack allocated memory space.

Parameters:

  • alignment (Int): Address alignment requirement for the allocation.

Returns:

Constructed NDBuffer with the allocated space.

stack_allocation

static stack_allocation() -> Self

Constructs an NDBuffer instance backed by stack allocated memory space.

Returns:

Constructed NDBuffer with the allocated space.

prefetch

prefetch[params: PrefetchOptions](self: Self, *idx: Int)

Prefetches the data at the given index.

Parameters:

  • params (PrefetchOptions): The prefetch configuration.

Args:

  • *idx (Int): The N-D index of the prefetched location.

prefetch[params: PrefetchOptions](self: Self, indices: StaticIntTuple[rank])

Prefetches the data at the given index.

Parameters:

  • params (PrefetchOptions): The prefetch configuration.

Args:

  • indices (StaticIntTuple[rank]): The N-D index of the prefetched location.