Mojo struct

TileTensor

@register_passable(trivial) struct TileTensor[mut: Bool, //, dtype: DType, LayoutType: TensorLayout, origin: Origin[mut=mut], *, address_space: AddressSpace = AddressSpace.GENERIC, linear_idx_type: DType = _get_index_type(address_space), element_size: Int = 1]

A tensor type with trait-based layouts supporting nested and hierarchical indexing.

TileTensor provides a flexible abstraction for multi-dimensional data with layouts expressed via the TensorLayout trait. Unlike LayoutTensor which uses a concrete Layout type, TileTensor accepts any type implementing TensorLayout, enabling more flexible compile-time layout composition.

When to use TileTensor vs LayoutTensor:

Use TileTensor when you need trait-based layout composition, nested layouts, or when working with the newer Coord-based layout system.
Use LayoutTensor when you need established operations like copy_dma, collective_load, or compatibility with existing code using IntTuple-based layouts.
Both types can interoperate via to_layout_tensor().

Example:

from layout.tile_layout import row_major
from layout import TileTensor
from layout import Idx

# Create a 4x4 tensor with row-major layout
var storage = InlineArray[Float32, 16](uninitialized=True)
var tensor = TileTensor(storage, row_major[4, 4]()).fill(0.0)

# Access elements using flat indices
tensor[0, 0] = 1.0
tensor[1, 2] = 2.0

# Extract a 2x2 tile at position (1, 0)
var tile = tensor.tile[2, 2](1, 0)

# Vectorize for SIMD operations (shape becomes 4x1, element size 1x4)
var vec = tensor.vectorize[1, 4]()

Parameters

mut (Bool): The inferred mutability of the underlying pointer.
dtype (DType): The data type of tensor elements (e.g., DType.float32).
LayoutType (TensorLayout): A type implementing TensorLayout that defines the tensor's shape and stride structure. Common types include Layout (with Coord-based shapes/strides) and RowMajorLayout.
origin (Origin): The origin of the underlying pointer for lifetime tracking.
address_space (AddressSpace): Memory address space (GENERIC, SHARED, CONSTANT, etc.). Defaults to GENERIC.
linear_idx_type (DType): Integer type for memory indexing. Defaults to int32 for shared/constant memory, int64 otherwise.
element_size (Int): The number of scalar elements per logical element after vectorization. Defaults to 1.

Fields

ptr (UnsafePointer[Scalar[dtype], origin, address_space=address_space]): Pointer to the tensor's underlying data storage.
layout (LayoutType): The layout instance defining shape and stride mappings.

Implemented traits

AnyType, Copyable, DevicePassable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable, Writable

`comptime` members

`__copy_ctor_is_trivial`

comptime __copy_ctor_is_trivial = LayoutType.__copy_ctor_is_trivial

`delis_trivial`

comptime __del__is_trivial = LayoutType.__del__is_trivial

`__move_ctor_is_trivial`

comptime __move_ctor_is_trivial = LayoutType.__move_ctor_is_trivial

`AddressSpaceCastType`

comptime AddressSpaceCastType[address_space: AddressSpace] = TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type]

Type alias for address-space-cast result tensors.

Parameters

address_space (AddressSpace): The address_space for the result tensor.

`all_dims_known`

comptime all_dims_known = LayoutType.all_dims_known

True if both shape and stride are fully known at compile time.

Required for operations like vectorize() and distribute().

`CoalescedType`

comptime CoalescedType = TileTensor[dtype, Layout[ComptimeInt[Coord[LayoutType._shape_types].static_product], ComptimeInt[1]], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]

Type alias for coalesced (flattened to rank-1) tensor types.

The coalesced tensor has:

shape: product of all original dimensions
stride: 1 (contiguous)
element shape: product of all original element dimensions
element stride: 1 (contiguous)

`device_type`

comptime device_type = TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]

Device-side type for GPU kernel parameter passing.

`DynamicType`

comptime DynamicType[dyn_dtype: DType] = TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, RuntimeInt[dyn_dtype])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, RuntimeInt[dyn_dtype]))], origin, address_space=address_space, linear_idx_type=linear_idx_type]

Type alias for dynamic tensor types.

Parameters

dyn_dtype (DType): The data type for RuntimeInt values in the dynamic tensor.

`ElementType`

comptime ElementType = SIMD[dtype, element_size]

The SIMD type used for element access.

For scalar tensors, this is SIMD[dtype, 1] (equivalent to Scalar[dtype]). For vectorized tensors, this reflects the vector width.

`flat_rank`

comptime flat_rank = Variadic.size[CoordLike](#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[idx].VariadicType if VA[idx].is_tuple else VA[idx])))

The flattened rank - total number of dimensions after flattening nested Coords.

For non-nested layouts, flat_rank == rank. For nested layouts (e.g., from blocked_product), flat_rank > rank.

`GenericType`

comptime GenericType = TileTensor[dtype, LayoutType, origin, linear_idx_type=linear_idx_type]

Type alias for this tensor with GENERIC address space.

Used by constructors that create tensors from Span, DeviceBuffer, or HostBuffer, which all produce GENERIC address space tensors.

`is_row_major`

comptime is_row_major = (Coord[#kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[1 if (VA[idx].static_value == #kgen.variadic.reduce(#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[(add (mul idx, -1), len(VA), -1)])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(ComptimeInt[1] if (idx == 0)._mlir_value else RuntimeInt[VA[(add idx, -1)].DTYPE if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].DTYPE] if VA[(add idx, -1)].is_static_value.__bool__().__invert__() if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].is_static_value.__bool__().__invert__() else ComptimeInt[(VA[(add idx, -1)].static_value * PrevV[0].static_value)], PrevV))[idx].static_value)._mlir_value else 0]))].static_product == 1 if (Variadic.size[CoordLike](LayoutType._shape_types) == 0)._mlir_value else Coord[#kgen.variadic.splat(ComptimeInt[1], Variadic.size[CoordLike](LayoutType._shape_types)._mlir_value)].static_product)

True if the tensor has row-major (contiguous) strides.

`OriginCastType`

comptime OriginCastType[mut: Bool, //, origin: Origin[mut=mut]] = TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type]

Type alias for origin-cast result tensors.

Parameters

mut (Bool): Whether the result tensor is mutable.
origin (Origin): The origin for the result tensor.

`rank`

comptime rank = LayoutType.rank

The number of dimensions in the tensor's layout.

`ReshapedType`

comptime ReshapedType[*new_shape_types: CoordLike] = TileTensor[dtype, Layout[new_shape_types, #kgen.variadic.reduce(#kgen.variadic.reduce(new_shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[(add (mul idx, -1), len(VA), -1)])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(ComptimeInt[1] if (idx == 0)._mlir_value else RuntimeInt[VA[(add idx, -1)].DTYPE if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].DTYPE] if VA[(add idx, -1)].is_static_value.__bool__().__invert__() if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].is_static_value.__bool__().__invert__() else ComptimeInt[(VA[(add idx, -1)].static_value * PrevV[0].static_value)], PrevV))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]

Type alias for reshaped tensor types.

Parameters

*new_shape_types (CoordLike): The shape types for the reshaped tensor.

`shape_known`

comptime shape_known = LayoutType.shape_known

True if all shape dimensions are compile-time constants.

`SIMDVectorizedType`

comptime SIMDVectorizedType = TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(((VA[idx].static_value + ComptimeInt[1], ComptimeInt[simd_width_of[dtype]()][idx].static_value) - 1) // ComptimeInt[1], ComptimeInt[simd_width_of[dtype]()][idx].static_value)])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value * ComptimeInt[1], ComptimeInt[simd_width_of[dtype]()][idx].static_value)]))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=Coord[ComptimeInt[1], ComptimeInt[simd_width_of[dtype]()]].static_product]

Result type for SIMD-width vectorization.

`static_shape`

comptime static_shape[i: Int] = LayoutType.static_shape[i]

Get the compile-time shape value for dimension i, or -1 if dynamic.

Parameters

i (Int): The dimension index.

`static_stride`

comptime static_stride[i: Int] = LayoutType.static_stride[i]

Get the compile-time stride value for dimension i, or -1 if dynamic.

Parameters

i (Int): The dimension index.

`stride_known`

comptime stride_known = LayoutType.stride_known

True if all stride dimensions are compile-time constants.

`VectorizedType`

comptime VectorizedType[*vector_shape: Int] = TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(((VA[idx].static_value + #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value) - 1) // #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value)])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value * #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value)]))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=Coord[#kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))].static_product]

Type alias for vectorized tensor types.

Parameters

*vector_shape (Int): The shape of each vector unit along each axis.

`ViewType`

comptime ViewType[new_layout: TensorLayout] = TileTensor[dtype, new_layout, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]

A TileTensor type with the same data properties but a different layout.

Preserves dtype, origin, address_space, and other properties while replacing LayoutType. Use this to name the return type of reshape() and other layout-changing operations in helper functions.

Parameters

new_layout (TensorLayout): The new TensorLayout type for the view.

Methods

`init`

__init__(var span: Span[Scalar[dtype], origin], var layout: LayoutType) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].GenericType

Create a TileTensor from a Span and layout.

Args:

span (Span): The memory span containing the tensor data.
layout (LayoutType): The layout defining the tensor's shape and strides.

Returns:

TileTensor

__init__(buffer: NDBuffer[buffer.dtype, buffer.rank, buffer.origin, buffer.shape, buffer.strides, alignment2=buffer.alignment2, address_space=buffer.address_space, exclusive=buffer.exclusive]) -> TileTensor[buffer.dtype, Layout[#kgen.variadic.reduce(buffer.shape.value.value, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Dim], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]._value_or_missing] if (VA[idx] != -31337)._mlir_value else RuntimeInt[DType.int64])), #kgen.variadic.reduce(#kgen.variadic.reduce(#kgen.variadic.reduce(buffer.shape.value.value, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Dim], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]._value_or_missing] if (VA[idx] != -31337)._mlir_value else RuntimeInt[DType.int64])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[(add (mul idx, -1), len(VA), -1)])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(ComptimeInt[1] if (idx == 0)._mlir_value else RuntimeInt[VA[(add idx, -1)].DTYPE if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].DTYPE] if VA[(add idx, -1)].is_static_value.__bool__().__invert__() if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].is_static_value.__bool__().__invert__() else ComptimeInt[(VA[(add idx, -1)].static_value * PrevV[0].static_value)], PrevV))], buffer.origin, address_space=buffer.address_space]

Create a TileTensor from an NDBuffer.

Converts an NDBuffer to a TileTensor, preserving shape and stride information. Static dimensions in the NDBuffer become ComptimeInt, dynamic dimensions become RuntimeInt. Strides are computed as row-major from the shape types via RowMajorLayout, recovering static stride info that NDBuffer's default all-unknown strides would lose.

Args:

buffer (NDBuffer): The NDBuffer to convert.

Returns:

TileTensor

__init__(ref[origin] device_buffer: DeviceBuffer[dtype], var layout: LayoutType) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].GenericType

Create a LayoutTensor from a DeviceBuffer. The layout must have statically known dimensions.

Note that the device buffer memory is on the accelerator device (GPU global memory). Code running on the CPU can use the DeviceContext to allocate a DeviceBuffer and use that to construct a LayoutTensor that can be accessed on the GPU. You cannot directly access data in the DeviceBuffer or LayoutTensor from the CPU.

The following example shows a typical pattern for using DeviceBuffer to construct a LayoutTensor that you can use on the GPU.

from std.gpu.host import DeviceContext, DeviceBuffer
from layout.tile_layout import row_major
from layout import TileTensor
from layout import Idx

comptime dtype = DType.float32

var ctx = DeviceContext()
# Allocate buffers
var dev_buf = ctx.enqueue_create_buffer[dtype](16)
var host_buf = ctx.enqueue_create_host_buffer[dtype](16)
# Ensure buffers have been created
ctx.synchronize()

# Initialize host buffer and copy to device buffer
for i in range(16):
    host_buf[i] = Scalar[dtype](i)
ctx.enqueue_copy(dev_buf, host_buf)

# Create TileTensor to use on device
var tensor = TileTensor(
     dev_buf,
     row_major((Idx[4](), Idx[4]())),
)
...

Args:

device_buffer (DeviceBuffer): Contains the underlying data to point to.
layout (LayoutType): The layout of the tensor.

Returns:

TileTensor

__init__(ref[origin] host_buffer: HostBuffer[dtype], var layout: LayoutType) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].GenericType

Create a LayoutTensor from a HostBuffer. The layout must have statically known dimensions.

The resulting tensor's data can only be accessed on the CPU.

from std.gpu.host import DeviceContext, HostBuffer
from layout.tile_layout import row_major
from layout import TileTensor
from layout import Idx

comptime dtype = DType.float32

var ctx = DeviceContext()
var host_buf = ctx.enqueue_create_host_buffer[dtype](8)

var tensor = TileTensor(
    host_buf,
    row_major((Idx[4](), Idx[4]())),
)

Args:

host_buffer (HostBuffer): Contains the underlying data to point to.
layout (LayoutType): The layout of the tensor.

Returns:

TileTensor

@implicit __init__(other: TileTensor[other.dtype, other.LayoutType, other.origin, address_space=other.address_space, linear_idx_type=other.linear_idx_type, element_size=other.element_size]) -> TileTensor[other.dtype, other.LayoutType, origin_of(other.origin), address_space=other.address_space, linear_idx_type=other.linear_idx_type, element_size=other.element_size]

Implicitly cast a mutable TileTensor to immutable.

Args:

other (TileTensor): The mutable TileTensor to cast from.

Returns:

TileTensor

`getitem`

__getitem__(self, coord: Coord[coord.element_types]) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].ElementType where (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)

Retrieve a single element from the tensor at the specified coordinates.

Accepts Coords of flat_rank (flattened).

Args:

coord (Coord): The coordinates specifying the element's position.

Returns:

TileTensor: The element at the specified position.

__getitem__[*IndexTypes: Indexer & Copyable](self, *items: *IndexTypes) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].ElementType where (Variadic.size[Indexer & Copyable](IndexTypes) == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)

Retrieves a single element from the tensor at the specified indices.

Uses flat indexing based on flat_rank. For non-nested layouts, flat_rank == rank, so tensor[i, j, k] works normally. For nested layouts (e.g., from blocked_product), use all flat_rank indices: tensor[i0, i1, i2, i3] for a tensor with flat_rank == 4.

Parameters:

*IndexTypes (Indexer & Copyable): The types of the indices (must implement Indexer).

Args:

*items (*IndexTypes): The indices specifying the element's position.

Returns:

TileTensor: The element at the specified position.

`setitem`

__setitem__(self, coord: Coord[coord.element_types], value: SIMD[dtype, element_size]) where mut if (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)._mlir_value else (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)

Set a single element in the tensor at the specified coordinates.

Accepts Coords of flat_rank (flattened).

Args:

coord (Coord): The coordinates specifying the element's position.
value (SIMD): The value to store at the specified position.

__setitem__[*IndexTypes: Indexer & Copyable](self, *items: *IndexTypes, *, value: SIMD[dtype, element_size]) where ((Variadic.size[Indexer & Copyable](IndexTypes) == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank) & mut)

Sets a single element in the tensor at the specified indices.

Uses flat indexing based on flat_rank. For non-nested layouts, flat_rank == rank, so tensor[i, j, k] = value works normally. For nested layouts (e.g., from blocked_product), use all flat_rank indices: tensor[i0, i1, i2, i3] = value for a tensor with flat_rank == 4.

Parameters:

*IndexTypes (Indexer & Copyable): The types of the indices (must implement Indexer).

Args:

*items (*IndexTypes): The indices specifying the element's position.
value (SIMD): The value to store.

`get_type_name`

static get_type_name() -> String

Gets the name of the host type (the one implementing this trait).

Returns:

String: The host type's name.

`load`

load[width: Int = element_size, alignment: Int = align_of[SIMD[dtype, width]](), invariant: Bool = False](self, coord: Coord[coord.element_types]) -> SIMD[dtype, width] where (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank) if (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)._mlir_value else (Coord[coord.element_types].flat_rank == 1)

Load elements from the tensor at the specified coordinates.

Supports both hierarchical indexing (rank indices) and flat indexing (flat_rank indices) for nested layouts.

Parameters:

width (Int): Number of elements to load (default: element_size).
alignment (Int): Memory alignment for the load.
invariant (Bool): If True, the compiler may assume the memory won't be modified during the kernel, enabling load hoisting and caching.

Args:

coord (Coord): The coordinates specifying the element's position.

Returns:

SIMD: A SIMD vector containing the loaded elements.

`store`

store[width: Int = element_size, alignment: Int = align_of[SIMD[dtype, width]]()](self, coord: Coord[coord.element_types], value: SIMD[dtype, width]) where mut if (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)._mlir_value else (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)

Store elements to the tensor at the specified coordinates.

Supports both hierarchical indexing (rank indices) and flat indexing (flat_rank indices) for nested layouts.

Parameters:

width (Int): Number of elements to store (default: element_size).
alignment (Int): Memory alignment for the store.

Args:

coord (Coord): The coordinates specifying the element's position.
value (SIMD): The SIMD vector to store.

`ptr_at_offset`

ptr_at_offset(self, coords: Coord[coords.element_types]) -> UnsafePointer[Scalar[dtype], origin, address_space=address_space] where (Coord[coords.element_types].rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].rank)

Get a pointer offset at the given flattened coordinates.

Args:

coords (Coord): A flattened list of the offset coordinates.

Returns:

UnsafePointer: A pointer offset at the given flattened coordinates.

`prefetch`

prefetch(self, coords: Coord[coords.element_types]) where (Coord[coords.element_types].rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].rank)

Prefetch tensor data at the specified coordinates into cache.

Issues a software prefetch hint to the processor to load the data at coords into the cache hierarchy. This can improve performance by reducing memory latency for subsequent accesses to the same location.

Performance:

Prefetching is a performance hint and does not guarantee data will be cached.
Most effective when issued sufficiently ahead of the actual data access.
Uses high locality prefetch to the data cache, optimized for data that will be accessed multiple times.
Can reduce memory access latency by 50-90% when used correctly.

Notes:

Excessive prefetching can pollute the cache and degrade performance.
Most beneficial for predictable access patterns that would otherwise cause cache misses.
No operation is performed on the prefetched data.

Args:

coords (Coord): The indices.

`numel`

numel(self) -> Int

Returns the total number of elements in the tensor.

Computes the product of all shape dimensions.

Returns:

Int: The total element count.

`write_to`

write_to(self, mut w: T)

Format and write the tensor's contents to a writer.

This method formats the tensor's contents and writes them to the provided writer. For 2D tensors, it formats the output in a 2D grid. For tensors of other ranks, it prints all values in column-major coordinate order.

Example:

from layout import TileTensor
from layout.tile_layout import row_major

def main() raises:
    var storage = InlineArray[Float32, 2 * 3](uninitialized=True)
    var tensor = TileTensor(storage, row_major[2, 3]()).fill(1.0)
    print(tensor)  # Internally calls `write_to` with a StringWriter

Output for a 2x3 tensor:

[[1.0, 1.0, 1.0],
    [1.0, 1.0, 1.0]]

Notes:

For 2D tensors, the output is formatted as a 2D grid with rows and columns.
For tensors of other ranks, values are printed in column-major coordinate order.
Empty tensors (size 0) produce no output.
This method is used by the __str__ method to convert the tensor to a string.
The formatting is designed for human readability rather than parsing.
For large tensors, the output may be truncated to avoid excessive output.

Args:

w (T): The writer instance to write the formatted output to.

`tile`

tile[*tile_sizes: Int](self, coordinates: Coord[coordinates.element_types]) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(tile_sizes, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), LayoutType._stride_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]

Extract a tile (sub-tensor) with the specified shape at the given coordinates.

Parameters:

*tile_sizes (Int): The dimensions of the tile along each axis.

Args:

coordinates (Coord): The tile coordinates as a Coord.

Returns:

TileTensor: A view into the original tensor representing the specified tile.

tile[*tile_sizes: Int, *, stride_layout: TensorLayout](self, coordinates: Coord[coordinates.element_types]) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(tile_sizes, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), stride_layout._shape_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]

Tile with explicit static strides.

Use when the parent tensor has dynamic (RuntimeInt) strides but the actual stride values are known at compile time. This produces a tile with all_dims_known=True, enabling vectorize/distribute.

This is needed because TensorLayout trait parameters erase concrete stride types -- the compiler cannot prove all_dims_known through a trait-bounded parameter even when the underlying strides are static.

Parameters:

*tile_sizes (Int): Tile dimensions along each axis.
stride_layout (TensorLayout): The layout providing static stride types.

Args:

coordinates (Coord): Tile coordinates in the grid.

Returns:

TileTensor: A view into the original tensor representing the specified tile.

tile[*tile_sizes: Int](self, *tile_coords: Int) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(tile_sizes, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), LayoutType._stride_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]

Extract a tile (sub-tensor) from this tensor with specified dimensions and position.

This overload accepts tile coordinates as variadic Int arguments, providing API compatibility with LayoutTensor.

Example:

from layout.tile_layout import row_major
from layout import TileTensor

var storage = InlineArray[Float32, 16](uninitialized=True)
var tensor = TileTensor(storage, row_major[4, 4]()).fill(1.0)

# Extract the tile at position (1, 0) with tile size 2x2
var t = tensor.tile[2, 2](1, 0)

Parameters:

*tile_sizes (Int): The dimensions of each tile along each axis.

Args:

*tile_coords (Int): The coordinates of the specific tile to extract.

Returns:

TileTensor: A view into the original tensor representing the specified tile.

`tile_with_offset`

tile_with_offset[*tile_sizes: Int](self, coordinates: Coord[coordinates.element_types]) -> Tuple[TileTensor[dtype, Layout[#kgen.variadic.reduce(tile_sizes, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), LayoutType._stride_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size], IndexList[Variadic.size[CoordLike](coordinates.element_types)], UInt]

Like tile(), but also returns corner coordinates and linear offset.

Parameters:

*tile_sizes (Int): Tile dimensions along each axis.

Args:

coordinates (Coord): Tile coordinates in the grid.

Returns:

Tuple: Tuple of (tile, corner_coords, offset).

tile_with_offset[*tile_sizes: Int, *, stride_layout: TensorLayout](self, coordinates: Coord[coordinates.element_types]) -> Tuple[TileTensor[dtype, Layout[#kgen.variadic.reduce(tile_sizes, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), stride_layout._shape_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size], IndexList[Variadic.size[CoordLike](coordinates.element_types)], UInt]

Like tile(), but with explicit static strides.

Use when the parent has dynamic strides but the values are known at compile time. See tile[stride_layout=...] for details.

Parameters:

*tile_sizes (Int): Tile dimensions along each axis.
stride_layout (TensorLayout): The layout providing static stride types.

Args:

coordinates (Coord): Tile coordinates in the grid.

Returns:

Tuple: Tuple of (tile, corner_coords, offset).

`reshape`

reshape[new_layout: TensorLayout](self, layout_val: new_layout) -> TileTensor[dtype, new_layout, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]

Create a view of the tensor with a different layout.

Returns a new TileTensor sharing the same pointer but with a different layout. This is a zero-cost operation -- only the layout type changes, no data is moved.

Parameters:

new_layout (TensorLayout): The target layout type (inferred from layout_val).

Args:

layout_val (new_layout): The layout instance to use for the new view.

Returns:

TileTensor: A TileTensor with the new layout viewing the same memory.

reshape[*new_shape: Int](self) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(new_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), #kgen.variadic.reduce(#kgen.variadic.reduce(#kgen.variadic.reduce(new_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[(add (mul idx, -1), len(VA), -1)])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(ComptimeInt[1] if (idx == 0)._mlir_value else RuntimeInt[VA[(add idx, -1)].DTYPE if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].DTYPE] if VA[(add idx, -1)].is_static_value.__bool__().__invert__() if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].is_static_value.__bool__().__invert__() else ComptimeInt[(VA[(add idx, -1)].static_value * PrevV[0].static_value)], PrevV))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size] where (Coord[LayoutType._shape_types].static_product == Coord[#kgen.variadic.reduce(new_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))].static_product) if TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].is_row_major if TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known else TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known else TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].is_row_major if TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known else TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known

Reshape the tensor to a new shape with compile-time dimensions.

This method creates a view of the tensor with a different logical shape while preserving the underlying data. The total number of elements must remain the same, and the tensor must have row-major (contiguous) strides.

Example:

from layout.tile_layout import row_major
from layout import TileTensor

var storage = InlineArray[Float32, 12](uninitialized=True)
var tensor = TileTensor(storage, row_major[3, 4]()).fill(1.0)
# tensor has shape (3, 4)

var reshaped = tensor.reshape[2, 6]()
# reshaped has shape (2, 6), same underlying data

var reshaped_1d = tensor.reshape[12]()
# reshaped_1d has shape (12,), equivalent to coalesce

Performance:

Creates a view without copying data.
Zero-cost abstraction at compile time when used with static shapes.

Constraints:

All dimensions must be statically known (all_dims_known).
The tensor must have row-major strides (is_row_major).
The product of the new shape must equal the product of the original shape.

Parameters:

*new_shape (Int): The new shape dimensions as compile-time integers.

Returns:

TileTensor: A TileTensor with the new shape and row-major strides, sharing the same underlying data as the original tensor.

reshape[*new_shape_types: CoordLike](self, new_shape: Coord[new_shape_types]) -> TileTensor[dtype, Layout[new_shape_types, #kgen.variadic.reduce(#kgen.variadic.reduce(new_shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[(add (mul idx, -1), len(VA), -1)])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(ComptimeInt[1] if (idx == 0)._mlir_value else RuntimeInt[VA[(add idx, -1)].DTYPE if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].DTYPE] if VA[(add idx, -1)].is_static_value.__bool__().__invert__() if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].is_static_value.__bool__().__invert__() else ComptimeInt[(VA[(add idx, -1)].static_value * PrevV[0].static_value)], PrevV))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size] where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]

Reshape the tensor to a new shape specified as a Coord.

This overload accepts shapes with runtime dimensions, performing the element count validation at runtime when needed.

Example:

from layout.tile_layout import row_major
from layout import TileTensor
from layout import Idx, Coord

var storage = InlineArray[Float32, 12](uninitialized=True)
var tensor = TileTensor(storage, row_major[3, 4]()).fill(1.0)

# Reshape with runtime-determined dimensions
var rows = 2
var cols = 6
var reshaped = tensor.reshape(Coord(Idx(rows), Idx(cols)))

Performance:

Creates a view without copying data.
May include runtime validation for dynamic shapes.

Constraints:

The tensor must have row-major strides (is_row_major).
The product of the new shape must equal the product of the original shape (validated at runtime for dynamic shapes).

Parameters:

*new_shape_types (CoordLike): The types of the new shape dimensions (inferred).

Args:

new_shape (Coord): The new shape as a Coord.

Returns:

TileTensor: A TileTensor with the new shape and row-major strides, sharing the same underlying data as the original tensor.

`distribute`

distribute[thread_layout: Layout[thread_layout.shape_types, thread_layout.stride_types], swizzle: Optional[Swizzle] = None](self, thread_id: Int) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value // thread_layout.shape_types[idx].static_value)])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value * thread_layout.shape_types[idx].static_value)]))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size] where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known

Distribute tensor workload across multiple threads in a structured pattern.

This method partitions a tensor across multiple threads for parallel processing, assigning each thread a specific portion of the tensor. The distribution pattern is determined by the thread_layout parameter, which defines the logical arrangement of threads.

Parameters:

thread_layout (Layout): Defines the logical arrangement of threads (e.g., 2x2 grid of 4 threads). This layout determines how the tensor is partitioned.
swizzle (Optional): Optional. A function that remaps the distribution pattern to improve memory access patterns or cache locality.

Args:

thread_id (Int): The ID of the current thread (0-based).

Returns:

TileTensor: A view into the original tensor representing the portion assigned to this thread.

`distribute_with_offset`

distribute_with_offset[thread_layout: Layout[thread_layout.shape_types, thread_layout.stride_types], swizzle: Optional[Swizzle] = None](self, thread_id: Int) -> Tuple[TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value // thread_layout.shape_types[idx].static_value)])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value * thread_layout.shape_types[idx].static_value)]))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size], IndexList[Variadic.size[CoordLike](thread_layout.shape_types)], UInt] where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known

Like distribute(), but also returns thread coordinates and offset.

Parameters:

thread_layout (Layout): Defines the logical arrangement of threads.
swizzle (Optional): Optional swizzle function.

Args:

thread_id (Int): The ID of the current thread (0-based).

Returns:

Tuple: Tuple of (distributed_tensor, thread_coords, offset).

`fill`

fill[*, use_runtime_layout: Bool = TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known.__bool__().__invert__() if TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known.__bool__().__invert__()._mlir_value else (Coord[LayoutType._shape_types].static_product > 2048)](self, val: Scalar[dtype]) -> Self where mut

Fill the entire tensor with a single value.

This method sets all elements of the tensor to the specified value. It works with both statically and dynamically shaped tensors.

For statically known layouts, the fill operation is unrolled at compile time. For dynamic layouts, a runtime loop is used. No vectorization is applied, so performance may be suboptimal for large tensors. Consider using hardware-specific fill operations for better performance with large tensors.

This method can be used with tensors of any rank and shape. The fill operation respects the tensor's layout, filling all elements regardless of how they are arranged in memory. For tensors with element_layout, all elements within each logical element are filled with the same value.

Example:

from layout.tile_layout import row_major
from layout import TileTensor

def main() raises:
    var storage = InlineArray[Float32, 3 * 4](uninitialized=True)
    var tensor = TileTensor(storage, row_major[3,4]()).fill(0.0)
    print(tensor)

If not using method chaining, you can either reassign the result to the tensor variable, or assign the result to the discard pattern (_) to avoid warnings about an unused value:

from layout.tile_layout import row_major
from layout import TileTensor

var storage = InlineArray[Float32, 3 * 4](uninitialized=True)
var tensor = TileTensor(storage, row_major[3,4]()).fill(0.0)
tensor = tensor.fill(0.0)
# or
_ = tensor.fill(0.0)

Parameters:

use_runtime_layout (Bool): Whether to use the runtime layout for filling. This parameter is defaulted to True if the layout is not statically known. If loop bounds are too large, it's better to use the runtime layout to avoid long compilation time.

Args:

val (Scalar): The value to fill the tensor with. Must be of the same data type as the tensor.

Returns:

Self: The tensor itself (self), allowing for method chaining.

`dim`

dim[i: Int](self) -> Scalar[linear_idx_type]

Returns the size of dimension i.

Parameters:

i (Int): The dimension index (compile-time constant).

Returns:

Scalar: The size of dimension i as a scalar.

dim[IndexType: Indexer](self, index: IndexType) -> Scalar[linear_idx_type]

Returns the size of the specified dimension.

Parameters:

IndexType (Indexer): The type of the index argument.

Args:

index (IndexType): The dimension index (runtime value).

Returns:

Scalar: The size of the specified dimension as a scalar.

`dynamic_stride`

dynamic_stride[IndexType: Indexer](self, index: IndexType) -> Scalar[linear_idx_type]

Returns the stride of the specified dimension.

Parameters:

IndexType (Indexer): The type of the index argument.

Args:

index (IndexType): The dimension index (runtime value).

Returns:

Scalar: The stride of the specified dimension as a scalar.

`slice`

slice[*slices: ContiguousSlice](self) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(slices[idx].end.or_else(VA[idx].static_value) - slices[idx].start.or_else(0))])), LayoutType._stride_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size] where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known if (Variadic.size[ContiguousSlice](slices) == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)._mlir_value else (Variadic.size[ContiguousSlice](slices) == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)

Extract a slice from the tensor using slice objects.

This method creates a view into a subset of the tensor defined by the slice specifications for each dimension. The slice is a continuous region of the tensor with no gaps (step size must be 1 for all dimensions).

The number of slice arguments must match the tensor rank.

Example:

For a 3D tensor, you can slice all three dimensions:

from layout.tile_layout import row_major
from layout import TileTensor
comptime layout_3d = row_major[16, 16, 16]()
var stack = InlineArray[UInt8, layout_3d.static_product](fill=0)
var tensor_3d = TileTensor(stack, layout_3d)
var slice = tensor_3d.slice[0:2, 1:3, 0:4]()

Performance:

Creates a view without copying data, making it very efficient.
Maintains the original tensor's stride information for efficient memory access.
Zero-cost abstraction at runtime when used with compile-time constant slices.

Notes:

The slice is a view into the original tensor, so modifications to the slice will affect the original tensor.
Works with tensors of any rank (must provide one slice per dimension).
The step size must be 1 for all dimensions (no gaps allowed).
Slice bounds are not checked at runtime; accessing out-of-bounds indices will result in undefined behavior.
Shape and stride types are converted to RuntimeInt in the sliced tensor, even if the original tensor had ComptimeInt dimensions. This is necessary because we can't change ComptimeInt[4] to ComptimeInt[2] in the type system.

Parameters:

*slices (ContiguousSlice): Slice specifications for each dimension. Each slice defines the start and end indices for that dimension.

Returns:

TileTensor: A view into the original tensor representing the specified slice. The returned tensor has the same rank but smaller dimensions.

slice(self, *slices: Tuple[Int, Int]) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, RuntimeInt[linear_idx_type])), LayoutType._stride_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]

Slice tensor with runtime start/end indices.

Unlike slice[]() which requires compile-time bounds, this method accepts runtime indices for fully dynamic slicing. Each argument is a (start, end) tuple for that dimension, matching the dimension-major ordering of the compile-time slice method.

Example:

# For a 2D tensor, slice rows 1:3 and columns 2:5
var sliced = tensor.slice((1, 3), (2, 5))

Args:

*slices (Tuple): Variadic (start, end) tuples, one per dimension.

Returns:

TileTensor: A view into the sliced region with RuntimeInt shape.

`vectorize`

vectorize[*vector_shape: Int](self) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(((VA[idx].static_value + #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value) - 1) // #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value)])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value * #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value)]))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=Coord[#kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))].static_product] where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known

Reshape a tensor into a vectorized form for efficient SIMD operations.

This method transforms the tensor's logical layout to enable efficient vectorized processing, treating blocks of elements as vector units. The transformation is particularly useful for SIMD (Single Instruction Multiple Data) operations and hardware acceleration.

The vector shape is tracked in element_size.

Example:

For a 16x16 tensor, vectorize[4, 4] will produce a 4x4 tensor where each element position is the starting point of a 4x4 block from the original tensor. The strides are scaled by the vector shape so that adjacent elements in the vectorized tensor are spaced apart by the vector dimensions.

Performance:

Creates a view without copying data, making it very efficient.
Enables strided access patterns suitable for SIMD vector loads.
Zero-cost abstraction at compile time when used with static shapes.

Constraints:

All dimensions must be statically known (all_dims_known).

Parameters:

*vector_shape (Int): The dimensions of each vector unit along each axis of the tensor. For example, in a 2D tensor, vectorize[4, 4] treats 4x4 blocks as vector units.

Returns:

TileTensor: A view of the tensor with a vectorized layout, where each element in the resulting tensor represents the start of a vector block from the original tensor. The element layout is tracked via element_size (the vector shape).

vectorize(self) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].SIMDVectorizedType where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known

Return a SIMD-width vectorized view of this tensor.

This is a convenience method that vectorizes along the last dimension by the SIMD width for the tensor's dtype.

Returns:

TileTensor: A Self.VectorizedType[1, simd_width_of[Self.dtype]()] view whose last dimension stride equals the SIMD width for the tensor's dtype.

`coalesce`

coalesce(self) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].CoalescedType where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].is_row_major if TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known else TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known

Creates a rank-1 tensor by flattening all dimensions.

Coalescing combines all dimensions into a single contiguous dimension. This is useful for operations that need to iterate over all elements sequentially.

Example:

For a 4x4 tensor, coalesce() produces a 16-element rank-1 tensor. For a vectorized tensor with shape (4, 4) and element shape (4, 4), coalescing produces shape (16,) with element shape (16,).

Performance:

Creates a view without copying data.
Enables simple sequential iteration over all elements.
Zero-cost abstraction at compile time.

Constraints:

All dimensions must be statically known (all_dims_known). The tensor must have row-major (contiguous) strides (is_row_major).

Returns:

TileTensor: A rank-1 tensor with shape equal to the product of all original dimensions and stride 1. Element layout is also coalesced.

`make_dynamic`

make_dynamic[dyn_dtype: DType](self) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, RuntimeInt[dyn_dtype])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, RuntimeInt[dyn_dtype]))], origin, address_space=address_space, linear_idx_type=linear_idx_type]

Convert all elements in shape and stride to RuntimeInt[dyn_dtype].

Examples:

from layout import TileTensor
from layout.tile_layout import row_major
var storage = InlineArray[Float32, 12](uninitialized=True)
var tensor = TileTensor(Span(storage), row_major[3, 4]())
var dynamic = tensor.make_dynamic[DType.int64]()
# dynamic has RuntimeInt[DType.int64] for all shape/stride dimensions

Parameters:

dyn_dtype (DType): The data type for the resulting RuntimeInt values.

Returns:

TileTensor: A new TileTensor where all elements in shape and stride are converted to RuntimeInt[dyn_dtype].

`to_layout_tensor`

to_layout_tensor(self) -> LayoutTensor[dtype, Layout(coord_to_int_tuple[LayoutType._shape_types](), coord_to_int_tuple[LayoutType._stride_types]()), origin, address_space=address_space]

Return a LayoutTensor with the same shape, stride, and address space of this tensor. Currently it expects flat layouts.

This is a utility to help with porting LayoutTensor methods to this type.

Returns:

LayoutTensor: A LayoutTensor with the same shape, stride, and address space of this tensor.

`as_any_origin`

as_any_origin(self) -> TileTensor[dtype, LayoutType, AnyOrigin[mut=mut._mlir_value], address_space=address_space, linear_idx_type=linear_idx_type]

Casts the origin of the mutable LayoutTensor to MutAnyOrigin.

This requires the tensor to already be mutable as casting mutability is inherently very unsafe.

It is usually preferred to maintain concrete origin values instead of using MutAnyOrigin. However, if it is needed, keep in mind that MutAnyOrigin can alias any memory value, so Mojo's ASAP destruction will not apply during the lifetime of the tensor.

Returns:

TileTensor: A pointer with the origin set to MutAnyOrigin.

`as_immut`

as_immut(self) -> TileTensor[dtype, LayoutType, origin_of(origin), address_space=address_space, linear_idx_type=linear_idx_type]

Return an immutable version of this tensor.

Returns:

TileTensor: A LayoutTensor covering the same elements, but without mutability.

`address_space_cast`

address_space_cast[target_address_space: AddressSpace](self) -> TileTensor[dtype, LayoutType, origin, address_space=target_address_space, linear_idx_type=linear_idx_type]

Return a version of this tensor cast to a new address space.

Parameters:

target_address_space (AddressSpace): The target address space to cast to.

Returns:

TileTensor: A TileTensor covering the same elements in the new address space.

`to_device_buffer`

to_device_buffer(self, ctx: DeviceContext) -> DeviceBuffer[dtype]

Convert the tensor to a DeviceBuffer.

Args:

ctx (DeviceContext): The device context to use.

Returns:

DeviceBuffer: A DeviceBuffer containing the tensor's data.

View source

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!

Parameters​

Fields​

Implemented traits​

comptime members​

__copy_ctor_is_trivial​

__del__is_trivial​

__move_ctor_is_trivial​

AddressSpaceCastType​

Parameters​

all_dims_known​

CoalescedType​

device_type​

DynamicType​

Parameters​

ElementType​

flat_rank​

GenericType​

is_row_major​

OriginCastType​

Parameters​

rank​

ReshapedType​

Parameters​

shape_known​

SIMDVectorizedType​

static_shape​

Parameters​

static_stride​

Parameters​

stride_known​

VectorizedType​

Parameters​

ViewType​

Parameters​

Methods​

__init__​

__getitem__​

__setitem__​

get_type_name​

load​

store​

ptr_at_offset​

prefetch​

numel​

write_to​

tile​

tile_with_offset​

reshape​

distribute​

distribute_with_offset​

fill​

dim​

dynamic_stride​

slice​

vectorize​

coalesce​

make_dynamic​

to_layout_tensor​

as_any_origin​

as_immut​

address_space_cast​

to_device_buffer​

Parameters

Fields

Implemented traits

`comptime` members

`__copy_ctor_is_trivial`

`delis_trivial`

`__move_ctor_is_trivial`

`AddressSpaceCastType`

Parameters

`all_dims_known`

`CoalescedType`

`device_type`

`DynamicType`

Parameters

`ElementType`

`flat_rank`

`GenericType`

`is_row_major`

`OriginCastType`

Parameters

`rank`

`ReshapedType`

Parameters

`shape_known`

`SIMDVectorizedType`

`static_shape`

Parameters

`static_stride`

Parameters

`stride_known`

`VectorizedType`

Parameters

`ViewType`

Parameters

Methods

`init`

`getitem`

`setitem`

`get_type_name`

`load`

`store`

`ptr_at_offset`

`prefetch`

`numel`

`write_to`

`tile`

`tile_with_offset`

`reshape`

`distribute`

`distribute_with_offset`

`fill`

`dim`

`dynamic_stride`

`slice`

`vectorize`

`coalesce`

`make_dynamic`

`to_layout_tensor`

`as_any_origin`

`as_immut`

`address_space_cast`

`to_device_buffer`