Mojo struct
TileTensor
@register_passable(trivial)
struct TileTensor[mut: Bool, //, dtype: DType, LayoutType: TensorLayout, origin: Origin[mut=mut], *, address_space: AddressSpace = AddressSpace.GENERIC, linear_idx_type: DType = _get_index_type(address_space), element_size: Int = 1]
A tensor type with trait-based layouts supporting nested and hierarchical indexing.
TileTensor provides a flexible abstraction for multi-dimensional data with
layouts expressed via the TensorLayout trait. Unlike LayoutTensor which
uses a concrete Layout type, TileTensor accepts any type implementing
TensorLayout, enabling more flexible compile-time layout composition.
When to use TileTensor vs LayoutTensor:
- Use
TileTensorwhen you need trait-based layout composition, nested layouts, or when working with the newerCoord-based layout system. - Use
LayoutTensorwhen you need established operations likecopy_dma,collective_load, or compatibility with existing code usingIntTuple-based layouts. - Both types can interoperate via
to_layout_tensor().
Example:
from layout.tile_layout import row_major
from layout import TileTensor
from layout import Idx
# Create a 4x4 tensor with row-major layout
var storage = InlineArray[Float32, 16](uninitialized=True)
var tensor = TileTensor(storage, row_major[4, 4]()).fill(0.0)
# Access elements using flat indices
tensor[0, 0] = 1.0
tensor[1, 2] = 2.0
# Extract a 2x2 tile at position (1, 0)
var tile = tensor.tile[2, 2](1, 0)
# Vectorize for SIMD operations (shape becomes 4x1, element size 1x4)
var vec = tensor.vectorize[1, 4]()Parameters
- mut (
Bool): The inferred mutability of the underlying pointer. - dtype (
DType): The data type of tensor elements (e.g.,DType.float32). - LayoutType (
TensorLayout): A type implementingTensorLayoutthat defines the tensor's shape and stride structure. Common types includeLayout(withCoord-based shapes/strides) andRowMajorLayout. - origin (
Origin): The origin of the underlying pointer for lifetime tracking. - address_space (
AddressSpace): Memory address space (GENERIC, SHARED, CONSTANT, etc.). Defaults to GENERIC. - linear_idx_type (
DType): Integer type for memory indexing. Defaults to int32 for shared/constant memory, int64 otherwise. - element_size (
Int): The number of scalar elements per logical element after vectorization. Defaults to 1.
Fields
- ptr (
UnsafePointer[Scalar[dtype], origin, address_space=address_space]): Pointer to the tensor's underlying data storage. - layout (
LayoutType): The layout instance defining shape and stride mappings.
Implemented traits
AnyType,
Copyable,
DevicePassable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable,
Writable
comptime members
__copy_ctor_is_trivial
comptime __copy_ctor_is_trivial = LayoutType.__copy_ctor_is_trivial
__del__is_trivial
comptime __del__is_trivial = LayoutType.__del__is_trivial
__move_ctor_is_trivial
comptime __move_ctor_is_trivial = LayoutType.__move_ctor_is_trivial
AddressSpaceCastType
comptime AddressSpaceCastType[address_space: AddressSpace] = TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type]
Type alias for address-space-cast result tensors.
Parameters
- address_space (
AddressSpace): The address_space for the result tensor.
all_dims_known
comptime all_dims_known = LayoutType.all_dims_known
True if both shape and stride are fully known at compile time.
Required for operations like vectorize() and distribute().
CoalescedType
comptime CoalescedType = TileTensor[dtype, Layout[ComptimeInt[Coord[LayoutType._shape_types].static_product], ComptimeInt[1]], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]
Type alias for coalesced (flattened to rank-1) tensor types.
The coalesced tensor has:
- shape: product of all original dimensions
- stride: 1 (contiguous)
- element shape: product of all original element dimensions
- element stride: 1 (contiguous)
device_type
comptime device_type = TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]
Device-side type for GPU kernel parameter passing.
DynamicType
comptime DynamicType[dyn_dtype: DType] = TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, RuntimeInt[dyn_dtype])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, RuntimeInt[dyn_dtype]))], origin, address_space=address_space, linear_idx_type=linear_idx_type]
Type alias for dynamic tensor types.
Parameters
- dyn_dtype (
DType): The data type for RuntimeInt values in the dynamic tensor.
ElementType
comptime ElementType = SIMD[dtype, element_size]
The SIMD type used for element access.
For scalar tensors, this is SIMD[dtype, 1] (equivalent to Scalar[dtype]).
For vectorized tensors, this reflects the vector width.
flat_rank
comptime flat_rank = Variadic.size[CoordLike](#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[idx].VariadicType if VA[idx].is_tuple else VA[idx])))
The flattened rank - total number of dimensions after flattening nested Coords.
For non-nested layouts, flat_rank == rank. For nested layouts (e.g., from blocked_product), flat_rank > rank.
GenericType
comptime GenericType = TileTensor[dtype, LayoutType, origin, linear_idx_type=linear_idx_type]
Type alias for this tensor with GENERIC address space.
Used by constructors that create tensors from Span, DeviceBuffer, or HostBuffer, which all produce GENERIC address space tensors.
is_row_major
comptime is_row_major = (Coord[#kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[1 if (VA[idx].static_value == #kgen.variadic.reduce(#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[(add (mul idx, -1), len(VA), -1)])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(ComptimeInt[1] if (idx == 0)._mlir_value else RuntimeInt[VA[(add idx, -1)].DTYPE if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].DTYPE] if VA[(add idx, -1)].is_static_value.__bool__().__invert__() if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].is_static_value.__bool__().__invert__() else ComptimeInt[(VA[(add idx, -1)].static_value * PrevV[0].static_value)], PrevV))[idx].static_value)._mlir_value else 0]))].static_product == 1 if (Variadic.size[CoordLike](LayoutType._shape_types) == 0)._mlir_value else Coord[#kgen.variadic.splat(ComptimeInt[1], Variadic.size[CoordLike](LayoutType._shape_types)._mlir_value)].static_product)
True if the tensor has row-major (contiguous) strides.
OriginCastType
comptime OriginCastType[mut: Bool, //, origin: Origin[mut=mut]] = TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type]
Type alias for origin-cast result tensors.
Parameters
rank
comptime rank = LayoutType.rank
The number of dimensions in the tensor's layout.
ReshapedType
comptime ReshapedType[*new_shape_types: CoordLike] = TileTensor[dtype, Layout[new_shape_types, #kgen.variadic.reduce(#kgen.variadic.reduce(new_shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[(add (mul idx, -1), len(VA), -1)])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(ComptimeInt[1] if (idx == 0)._mlir_value else RuntimeInt[VA[(add idx, -1)].DTYPE if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].DTYPE] if VA[(add idx, -1)].is_static_value.__bool__().__invert__() if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].is_static_value.__bool__().__invert__() else ComptimeInt[(VA[(add idx, -1)].static_value * PrevV[0].static_value)], PrevV))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]
Type alias for reshaped tensor types.
Parameters
- *new_shape_types (
CoordLike): The shape types for the reshaped tensor.
shape_known
comptime shape_known = LayoutType.shape_known
True if all shape dimensions are compile-time constants.
SIMDVectorizedType
comptime SIMDVectorizedType = TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(((VA[idx].static_value + ComptimeInt[1], ComptimeInt[simd_width_of[dtype]()][idx].static_value) - 1) // ComptimeInt[1], ComptimeInt[simd_width_of[dtype]()][idx].static_value)])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value * ComptimeInt[1], ComptimeInt[simd_width_of[dtype]()][idx].static_value)]))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=Coord[ComptimeInt[1], ComptimeInt[simd_width_of[dtype]()]].static_product]
Result type for SIMD-width vectorization.
static_shape
comptime static_shape[i: Int] = LayoutType.static_shape[i]
Get the compile-time shape value for dimension i, or -1 if dynamic.
Parameters
- i (
Int): The dimension index.
static_stride
comptime static_stride[i: Int] = LayoutType.static_stride[i]
Get the compile-time stride value for dimension i, or -1 if dynamic.
Parameters
- i (
Int): The dimension index.
stride_known
comptime stride_known = LayoutType.stride_known
True if all stride dimensions are compile-time constants.
VectorizedType
comptime VectorizedType[*vector_shape: Int] = TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(((VA[idx].static_value + #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value) - 1) // #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value)])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value * #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value)]))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=Coord[#kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))].static_product]
Type alias for vectorized tensor types.
Parameters
- *vector_shape (
Int): The shape of each vector unit along each axis.
ViewType
comptime ViewType[new_layout: TensorLayout] = TileTensor[dtype, new_layout, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]
A TileTensor type with the same data properties but a different layout.
Preserves dtype, origin, address_space, and other properties while replacing LayoutType. Use this to name the return type of reshape() and other layout-changing operations in helper functions.
Parameters
- new_layout (
TensorLayout): The new TensorLayout type for the view.
Methods
__init__
__init__(var span: Span[Scalar[dtype], origin], var layout: LayoutType) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].GenericType
Create a TileTensor from a Span and layout.
Args:
- span (
Span): The memory span containing the tensor data. - layout (
LayoutType): The layout defining the tensor's shape and strides.
Returns:
__init__(buffer: NDBuffer[buffer.dtype, buffer.rank, buffer.origin, buffer.shape, buffer.strides, alignment2=buffer.alignment2, address_space=buffer.address_space, exclusive=buffer.exclusive]) -> TileTensor[buffer.dtype, Layout[#kgen.variadic.reduce(buffer.shape.value.value, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Dim], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]._value_or_missing] if (VA[idx] != -31337)._mlir_value else RuntimeInt[DType.int64])), #kgen.variadic.reduce(#kgen.variadic.reduce(#kgen.variadic.reduce(buffer.shape.value.value, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Dim], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]._value_or_missing] if (VA[idx] != -31337)._mlir_value else RuntimeInt[DType.int64])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[(add (mul idx, -1), len(VA), -1)])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(ComptimeInt[1] if (idx == 0)._mlir_value else RuntimeInt[VA[(add idx, -1)].DTYPE if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].DTYPE] if VA[(add idx, -1)].is_static_value.__bool__().__invert__() if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].is_static_value.__bool__().__invert__() else ComptimeInt[(VA[(add idx, -1)].static_value * PrevV[0].static_value)], PrevV))], buffer.origin, address_space=buffer.address_space]
Create a TileTensor from an NDBuffer.
Converts an NDBuffer to a TileTensor, preserving shape and stride
information. Static dimensions in the NDBuffer become ComptimeInt,
dynamic dimensions become RuntimeInt. Strides are computed as
row-major from the shape types via RowMajorLayout, recovering
static stride info that NDBuffer's default all-unknown strides
would lose.
Args:
- buffer (
NDBuffer): The NDBuffer to convert.
Returns:
__init__(ref[origin] device_buffer: DeviceBuffer[dtype], var layout: LayoutType) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].GenericType
Create a LayoutTensor from a DeviceBuffer. The layout must have statically known dimensions.
Note that the device buffer memory is on the accelerator device (GPU
global memory). Code running on the CPU can use the
DeviceContext to
allocate a DeviceBuffer and use that to construct a LayoutTensor
that can be accessed on the GPU. You cannot directly access data in the
DeviceBuffer or LayoutTensor from the CPU.
The following example shows a typical pattern for using DeviceBuffer
to construct a LayoutTensor that you can use on the GPU.
from std.gpu.host import DeviceContext, DeviceBuffer
from layout.tile_layout import row_major
from layout import TileTensor
from layout import Idx
comptime dtype = DType.float32
var ctx = DeviceContext()
# Allocate buffers
var dev_buf = ctx.enqueue_create_buffer[dtype](16)
var host_buf = ctx.enqueue_create_host_buffer[dtype](16)
# Ensure buffers have been created
ctx.synchronize()
# Initialize host buffer and copy to device buffer
for i in range(16):
host_buf[i] = Scalar[dtype](i)
ctx.enqueue_copy(dev_buf, host_buf)
# Create TileTensor to use on device
var tensor = TileTensor(
dev_buf,
row_major((Idx[4](), Idx[4]())),
)
...Args:
- device_buffer (
DeviceBuffer): Contains the underlying data to point to. - layout (
LayoutType): The layout of the tensor.
Returns:
__init__(ref[origin] host_buffer: HostBuffer[dtype], var layout: LayoutType) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].GenericType
Create a LayoutTensor from a HostBuffer. The layout must have statically known dimensions.
The resulting tensor's data can only be accessed on the CPU.
from std.gpu.host import DeviceContext, HostBuffer
from layout.tile_layout import row_major
from layout import TileTensor
from layout import Idx
comptime dtype = DType.float32
var ctx = DeviceContext()
var host_buf = ctx.enqueue_create_host_buffer[dtype](8)
var tensor = TileTensor(
host_buf,
row_major((Idx[4](), Idx[4]())),
)Args:
- host_buffer (
HostBuffer): Contains the underlying data to point to. - layout (
LayoutType): The layout of the tensor.
Returns:
@implicit
__init__(other: TileTensor[other.dtype, other.LayoutType, other.origin, address_space=other.address_space, linear_idx_type=other.linear_idx_type, element_size=other.element_size]) -> TileTensor[other.dtype, other.LayoutType, origin_of(other.origin), address_space=other.address_space, linear_idx_type=other.linear_idx_type, element_size=other.element_size]
Implicitly cast a mutable TileTensor to immutable.
Args:
- other (
TileTensor): The mutable TileTensor to cast from.
Returns:
__getitem__
__getitem__(self, coord: Coord[coord.element_types]) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].ElementType where (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)
Retrieve a single element from the tensor at the specified coordinates.
Accepts Coords of flat_rank (flattened).
Args:
- coord (
Coord): The coordinates specifying the element's position.
Returns:
TileTensor: The element at the specified position.
__getitem__[*IndexTypes: Indexer & Copyable](self, *items: *IndexTypes) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].ElementType where (Variadic.size[Indexer & Copyable](IndexTypes) == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)
Retrieves a single element from the tensor at the specified indices.
Uses flat indexing based on flat_rank. For non-nested layouts, flat_rank == rank, so tensor[i, j, k] works normally. For nested layouts (e.g., from blocked_product), use all flat_rank indices: tensor[i0, i1, i2, i3] for a tensor with flat_rank == 4.
Parameters:
Args:
- *items (
*IndexTypes): The indices specifying the element's position.
Returns:
TileTensor: The element at the specified position.
__setitem__
__setitem__(self, coord: Coord[coord.element_types], value: SIMD[dtype, element_size]) where mut if (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)._mlir_value else (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)
Set a single element in the tensor at the specified coordinates.
Accepts Coords of flat_rank (flattened).
Args:
__setitem__[*IndexTypes: Indexer & Copyable](self, *items: *IndexTypes, *, value: SIMD[dtype, element_size]) where ((Variadic.size[Indexer & Copyable](IndexTypes) == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank) & mut)
Sets a single element in the tensor at the specified indices.
Uses flat indexing based on flat_rank. For non-nested layouts, flat_rank == rank, so tensor[i, j, k] = value works normally. For nested layouts (e.g., from blocked_product), use all flat_rank indices: tensor[i0, i1, i2, i3] = value for a tensor with flat_rank == 4.
Parameters:
Args:
- *items (
*IndexTypes): The indices specifying the element's position. - value (
SIMD): The value to store.
get_type_name
static get_type_name() -> String
Gets the name of the host type (the one implementing this trait).
Returns:
String: The host type's name.
load
load[width: Int = element_size, alignment: Int = align_of[SIMD[dtype, width]](), invariant: Bool = False](self, coord: Coord[coord.element_types]) -> SIMD[dtype, width] where (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank) if (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)._mlir_value else (Coord[coord.element_types].flat_rank == 1)
Load elements from the tensor at the specified coordinates.
Supports both hierarchical indexing (rank indices) and flat indexing (flat_rank indices) for nested layouts.
Parameters:
- width (
Int): Number of elements to load (default: element_size). - alignment (
Int): Memory alignment for the load. - invariant (
Bool): If True, the compiler may assume the memory won't be modified during the kernel, enabling load hoisting and caching.
Args:
- coord (
Coord): The coordinates specifying the element's position.
Returns:
SIMD: A SIMD vector containing the loaded elements.
store
store[width: Int = element_size, alignment: Int = align_of[SIMD[dtype, width]]()](self, coord: Coord[coord.element_types], value: SIMD[dtype, width]) where mut if (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)._mlir_value else (Coord[coord.element_types].flat_rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)
Store elements to the tensor at the specified coordinates.
Supports both hierarchical indexing (rank indices) and flat indexing (flat_rank indices) for nested layouts.
Parameters:
- width (
Int): Number of elements to store (default: element_size). - alignment (
Int): Memory alignment for the store.
Args:
ptr_at_offset
ptr_at_offset(self, coords: Coord[coords.element_types]) -> UnsafePointer[Scalar[dtype], origin, address_space=address_space] where (Coord[coords.element_types].rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].rank)
Get a pointer offset at the given flattened coordinates.
Args:
- coords (
Coord): A flattened list of the offset coordinates.
Returns:
UnsafePointer: A pointer offset at the given flattened coordinates.
prefetch
prefetch(self, coords: Coord[coords.element_types]) where (Coord[coords.element_types].rank == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].rank)
Prefetch tensor data at the specified coordinates into cache.
Issues a software prefetch hint to the processor to load the data at coords into the cache hierarchy. This can improve performance by reducing memory latency for subsequent accesses to the same location.
Performance:
- Prefetching is a performance hint and does not guarantee data will be cached.
- Most effective when issued sufficiently ahead of the actual data access.
- Uses high locality prefetch to the data cache, optimized for data that will be accessed multiple times.
- Can reduce memory access latency by 50-90% when used correctly.
Notes:
- Excessive prefetching can pollute the cache and degrade performance.
- Most beneficial for predictable access patterns that would otherwise cause cache misses.
- No operation is performed on the prefetched data.
Args:
- coords (
Coord): The indices.
numel
numel(self) -> Int
Returns the total number of elements in the tensor.
Computes the product of all shape dimensions.
Returns:
Int: The total element count.
write_to
write_to(self, mut w: T)
Format and write the tensor's contents to a writer.
This method formats the tensor's contents and writes them to the provided writer. For 2D tensors, it formats the output in a 2D grid. For tensors of other ranks, it prints all values in column-major coordinate order.
Example:
from layout import TileTensor
from layout.tile_layout import row_major
def main() raises:
var storage = InlineArray[Float32, 2 * 3](uninitialized=True)
var tensor = TileTensor(storage, row_major[2, 3]()).fill(1.0)
print(tensor) # Internally calls `write_to` with a StringWriterOutput for a 2x3 tensor:
[[1.0, 1.0, 1.0],
[1.0, 1.0, 1.0]]Notes:
- For 2D tensors, the output is formatted as a 2D grid with rows and columns.
- For tensors of other ranks, values are printed in column-major coordinate order.
- Empty tensors (size 0) produce no output.
- This method is used by the
__str__method to convert the tensor to a string. - The formatting is designed for human readability rather than parsing.
- For large tensors, the output may be truncated to avoid excessive output.
Args:
- w (
T): The writer instance to write the formatted output to.
tile
tile[*tile_sizes: Int](self, coordinates: Coord[coordinates.element_types]) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(tile_sizes, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), LayoutType._stride_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]
Extract a tile (sub-tensor) with the specified shape at the given coordinates.
Parameters:
- *tile_sizes (
Int): The dimensions of the tile along each axis.
Args:
- coordinates (
Coord): The tile coordinates as a Coord.
Returns:
TileTensor: A view into the original tensor representing the specified tile.
tile[*tile_sizes: Int, *, stride_layout: TensorLayout](self, coordinates: Coord[coordinates.element_types]) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(tile_sizes, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), stride_layout._shape_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]
Tile with explicit static strides.
Use when the parent tensor has dynamic (RuntimeInt) strides but the actual stride values are known at compile time. This produces a tile with all_dims_known=True, enabling vectorize/distribute.
This is needed because TensorLayout trait parameters erase concrete stride types -- the compiler cannot prove all_dims_known through a trait-bounded parameter even when the underlying strides are static.
Parameters:
- *tile_sizes (
Int): Tile dimensions along each axis. - stride_layout (
TensorLayout): The layout providing static stride types.
Args:
- coordinates (
Coord): Tile coordinates in the grid.
Returns:
TileTensor: A view into the original tensor representing the specified tile.
tile[*tile_sizes: Int](self, *tile_coords: Int) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(tile_sizes, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), LayoutType._stride_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]
Extract a tile (sub-tensor) from this tensor with specified dimensions and position.
This overload accepts tile coordinates as variadic Int arguments, providing API compatibility with LayoutTensor.
Example:
from layout.tile_layout import row_major
from layout import TileTensor
var storage = InlineArray[Float32, 16](uninitialized=True)
var tensor = TileTensor(storage, row_major[4, 4]()).fill(1.0)
# Extract the tile at position (1, 0) with tile size 2x2
var t = tensor.tile[2, 2](1, 0)Parameters:
- *tile_sizes (
Int): The dimensions of each tile along each axis.
Args:
- *tile_coords (
Int): The coordinates of the specific tile to extract.
Returns:
TileTensor: A view into the original tensor representing the specified tile.
tile_with_offset
tile_with_offset[*tile_sizes: Int](self, coordinates: Coord[coordinates.element_types]) -> Tuple[TileTensor[dtype, Layout[#kgen.variadic.reduce(tile_sizes, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), LayoutType._stride_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size], IndexList[Variadic.size[CoordLike](coordinates.element_types)], UInt]
Like tile(), but also returns corner coordinates and linear offset.
Parameters:
- *tile_sizes (
Int): Tile dimensions along each axis.
Args:
- coordinates (
Coord): Tile coordinates in the grid.
Returns:
Tuple: Tuple of (tile, corner_coords, offset).
tile_with_offset[*tile_sizes: Int, *, stride_layout: TensorLayout](self, coordinates: Coord[coordinates.element_types]) -> Tuple[TileTensor[dtype, Layout[#kgen.variadic.reduce(tile_sizes, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), stride_layout._shape_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size], IndexList[Variadic.size[CoordLike](coordinates.element_types)], UInt]
Like tile(), but with explicit static strides.
Use when the parent has dynamic strides but the values are known at compile time. See tile[stride_layout=...] for details.
Parameters:
- *tile_sizes (
Int): Tile dimensions along each axis. - stride_layout (
TensorLayout): The layout providing static stride types.
Args:
- coordinates (
Coord): Tile coordinates in the grid.
Returns:
Tuple: Tuple of (tile, corner_coords, offset).
reshape
reshape[new_layout: TensorLayout](self, layout_val: new_layout) -> TileTensor[dtype, new_layout, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]
Create a view of the tensor with a different layout.
Returns a new TileTensor sharing the same pointer but with a different layout. This is a zero-cost operation -- only the layout type changes, no data is moved.
Parameters:
- new_layout (
TensorLayout): The target layout type (inferred from layout_val).
Args:
- layout_val (
new_layout): The layout instance to use for the new view.
Returns:
TileTensor: A TileTensor with the new layout viewing the same memory.
reshape[*new_shape: Int](self) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(new_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), #kgen.variadic.reduce(#kgen.variadic.reduce(#kgen.variadic.reduce(new_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[(add (mul idx, -1), len(VA), -1)])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(ComptimeInt[1] if (idx == 0)._mlir_value else RuntimeInt[VA[(add idx, -1)].DTYPE if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].DTYPE] if VA[(add idx, -1)].is_static_value.__bool__().__invert__() if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].is_static_value.__bool__().__invert__() else ComptimeInt[(VA[(add idx, -1)].static_value * PrevV[0].static_value)], PrevV))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size] where (Coord[LayoutType._shape_types].static_product == Coord[#kgen.variadic.reduce(new_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))].static_product) if TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].is_row_major if TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known else TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known else TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].is_row_major if TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known else TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known
Reshape the tensor to a new shape with compile-time dimensions.
This method creates a view of the tensor with a different logical shape while preserving the underlying data. The total number of elements must remain the same, and the tensor must have row-major (contiguous) strides.
Example:
from layout.tile_layout import row_major
from layout import TileTensor
var storage = InlineArray[Float32, 12](uninitialized=True)
var tensor = TileTensor(storage, row_major[3, 4]()).fill(1.0)
# tensor has shape (3, 4)
var reshaped = tensor.reshape[2, 6]()
# reshaped has shape (2, 6), same underlying data
var reshaped_1d = tensor.reshape[12]()
# reshaped_1d has shape (12,), equivalent to coalescePerformance:
- Creates a view without copying data.
- Zero-cost abstraction at compile time when used with static shapes.
Constraints:
- All dimensions must be statically known (
all_dims_known). - The tensor must have row-major strides (
is_row_major). - The product of the new shape must equal the product of the original shape.
Parameters:
- *new_shape (
Int): The new shape dimensions as compile-time integers.
Returns:
TileTensor: A TileTensor with the new shape and row-major strides, sharing
the same underlying data as the original tensor.
reshape[*new_shape_types: CoordLike](self, new_shape: Coord[new_shape_types]) -> TileTensor[dtype, Layout[new_shape_types, #kgen.variadic.reduce(#kgen.variadic.reduce(new_shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, VA[(add (mul idx, -1), len(VA), -1)])), base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(ComptimeInt[1] if (idx == 0)._mlir_value else RuntimeInt[VA[(add idx, -1)].DTYPE if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].DTYPE] if VA[(add idx, -1)].is_static_value.__bool__().__invert__() if VA[(add idx, -1)].is_static_value.__bool__().__invert__()._mlir_value else PrevV[0].is_static_value.__bool__().__invert__() else ComptimeInt[(VA[(add idx, -1)].static_value * PrevV[0].static_value)], PrevV))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size] where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]
Reshape the tensor to a new shape specified as a Coord.
This method creates a view of the tensor with a different logical shape while preserving the underlying data. The total number of elements must remain the same, and the tensor must have row-major (contiguous) strides.
This overload accepts shapes with runtime dimensions, performing the element count validation at runtime when needed.
Example:
from layout.tile_layout import row_major
from layout import TileTensor
from layout import Idx, Coord
var storage = InlineArray[Float32, 12](uninitialized=True)
var tensor = TileTensor(storage, row_major[3, 4]()).fill(1.0)
# Reshape with runtime-determined dimensions
var rows = 2
var cols = 6
var reshaped = tensor.reshape(Coord(Idx(rows), Idx(cols)))Performance:
- Creates a view without copying data.
- May include runtime validation for dynamic shapes.
Constraints:
- The tensor must have row-major strides (
is_row_major). - The product of the new shape must equal the product of the original shape (validated at runtime for dynamic shapes).
Parameters:
- *new_shape_types (
CoordLike): The types of the new shape dimensions (inferred).
Args:
- new_shape (
Coord): The new shape as a Coord.
Returns:
TileTensor: A TileTensor with the new shape and row-major strides, sharing
the same underlying data as the original tensor.
distribute
distribute[thread_layout: Layout[thread_layout.shape_types, thread_layout.stride_types], swizzle: Optional[Swizzle] = None](self, thread_id: Int) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value // thread_layout.shape_types[idx].static_value)])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value * thread_layout.shape_types[idx].static_value)]))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size] where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known
Distribute tensor workload across multiple threads in a structured pattern.
This method partitions a tensor across multiple threads for parallel processing, assigning each thread a specific portion of the tensor. The distribution pattern is determined by the thread_layout parameter, which defines the logical arrangement of threads.
Parameters:
- thread_layout (
Layout): Defines the logical arrangement of threads (e.g., 2x2 grid of 4 threads). This layout determines how the tensor is partitioned. - swizzle (
Optional): Optional. A function that remaps the distribution pattern to improve memory access patterns or cache locality.
Args:
- thread_id (
Int): The ID of the current thread (0-based).
Returns:
TileTensor: A view into the original tensor representing the portion assigned to
this thread.
distribute_with_offset
distribute_with_offset[thread_layout: Layout[thread_layout.shape_types, thread_layout.stride_types], swizzle: Optional[Swizzle] = None](self, thread_id: Int) -> Tuple[TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value // thread_layout.shape_types[idx].static_value)])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value * thread_layout.shape_types[idx].static_value)]))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size], IndexList[Variadic.size[CoordLike](thread_layout.shape_types)], UInt] where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known
Like distribute(), but also returns thread coordinates and offset.
Parameters:
- thread_layout (
Layout): Defines the logical arrangement of threads. - swizzle (
Optional): Optional swizzle function.
Args:
- thread_id (
Int): The ID of the current thread (0-based).
Returns:
Tuple: Tuple of (distributed_tensor, thread_coords, offset).
fill
fill[*, use_runtime_layout: Bool = TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known.__bool__().__invert__() if TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known.__bool__().__invert__()._mlir_value else (Coord[LayoutType._shape_types].static_product > 2048)](self, val: Scalar[dtype]) -> Self where mut
Fill the entire tensor with a single value.
This method sets all elements of the tensor to the specified value. It works with both statically and dynamically shaped tensors.
For statically known layouts, the fill operation is unrolled at compile time. For dynamic layouts, a runtime loop is used. No vectorization is applied, so performance may be suboptimal for large tensors. Consider using hardware-specific fill operations for better performance with large tensors.
This method can be used with tensors of any rank and shape. The
fill operation respects the tensor's layout, filling all
elements regardless of how they are arranged in memory. For
tensors with element_layout, all elements within each logical element
are filled with the same value.
Example:
from layout.tile_layout import row_major
from layout import TileTensor
def main() raises:
var storage = InlineArray[Float32, 3 * 4](uninitialized=True)
var tensor = TileTensor(storage, row_major[3,4]()).fill(0.0)
print(tensor)If not using method chaining, you can either reassign the result to the
tensor variable, or assign the result to the discard pattern (_) to
avoid warnings about an unused value:
from layout.tile_layout import row_major
from layout import TileTensor
var storage = InlineArray[Float32, 3 * 4](uninitialized=True)
var tensor = TileTensor(storage, row_major[3,4]()).fill(0.0)
tensor = tensor.fill(0.0)
# or
_ = tensor.fill(0.0)Parameters:
- use_runtime_layout (
Bool): Whether to use the runtime layout for filling. This parameter is defaulted toTrueif the layout is not statically known. If loop bounds are too large, it's better to use the runtime layout to avoid long compilation time.
Args:
- val (
Scalar): The value to fill the tensor with. Must be of the same data type as the tensor.
Returns:
Self: The tensor itself (self), allowing for method chaining.
dim
dim[i: Int](self) -> Scalar[linear_idx_type]
Returns the size of dimension i.
Parameters:
- i (
Int): The dimension index (compile-time constant).
Returns:
Scalar: The size of dimension i as a scalar.
dim[IndexType: Indexer](self, index: IndexType) -> Scalar[linear_idx_type]
Returns the size of the specified dimension.
Parameters:
- IndexType (
Indexer): The type of the index argument.
Args:
- index (
IndexType): The dimension index (runtime value).
Returns:
Scalar: The size of the specified dimension as a scalar.
dynamic_stride
dynamic_stride[IndexType: Indexer](self, index: IndexType) -> Scalar[linear_idx_type]
Returns the stride of the specified dimension.
Parameters:
- IndexType (
Indexer): The type of the index argument.
Args:
- index (
IndexType): The dimension index (runtime value).
Returns:
Scalar: The stride of the specified dimension as a scalar.
slice
slice[*slices: ContiguousSlice](self) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(slices[idx].end.or_else(VA[idx].static_value) - slices[idx].start.or_else(0))])), LayoutType._stride_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size] where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known if (Variadic.size[ContiguousSlice](slices) == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)._mlir_value else (Variadic.size[ContiguousSlice](slices) == TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].flat_rank)
Extract a slice from the tensor using slice objects.
This method creates a view into a subset of the tensor defined by the slice specifications for each dimension. The slice is a continuous region of the tensor with no gaps (step size must be 1 for all dimensions).
The number of slice arguments must match the tensor rank.
Example:
For a 3D tensor, you can slice all three dimensions:
from layout.tile_layout import row_major
from layout import TileTensor
comptime layout_3d = row_major[16, 16, 16]()
var stack = InlineArray[UInt8, layout_3d.static_product](fill=0)
var tensor_3d = TileTensor(stack, layout_3d)
var slice = tensor_3d.slice[0:2, 1:3, 0:4]()Performance:
- Creates a view without copying data, making it very efficient.
- Maintains the original tensor's stride information for efficient memory access.
- Zero-cost abstraction at runtime when used with compile-time constant slices.
Notes:
- The slice is a view into the original tensor, so modifications to the slice will affect the original tensor.
- Works with tensors of any rank (must provide one slice per dimension).
- The step size must be 1 for all dimensions (no gaps allowed).
- Slice bounds are not checked at runtime; accessing out-of-bounds indices will result in undefined behavior.
- Shape and stride types are converted to RuntimeInt in the sliced tensor, even if the original tensor had ComptimeInt dimensions. This is necessary because we can't change ComptimeInt[4] to ComptimeInt[2] in the type system.
Parameters:
- *slices (
ContiguousSlice): Slice specifications for each dimension. Each slice defines the start and end indices for that dimension.
Returns:
TileTensor: A view into the original tensor representing the specified slice.
The returned tensor has the same rank but smaller dimensions.
slice(self, *slices: Tuple[Int, Int]) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, RuntimeInt[linear_idx_type])), LayoutType._stride_types], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size]
Slice tensor with runtime start/end indices.
Unlike slice[]() which requires compile-time bounds, this method
accepts runtime indices for fully dynamic slicing. Each argument is
a (start, end) tuple for that dimension, matching the dimension-major
ordering of the compile-time slice method.
Example:
# For a 2D tensor, slice rows 1:3 and columns 2:5
var sliced = tensor.slice((1, 3), (2, 5))Args:
- *slices (
Tuple): Variadic (start, end) tuples, one per dimension.
Returns:
TileTensor: A view into the sliced region with RuntimeInt shape.
vectorize
vectorize[*vector_shape: Int](self) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(((VA[idx].static_value + #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value) - 1) // #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value)])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[(VA[idx].static_value * #kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))[idx].static_value)]))], origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=Coord[#kgen.variadic.reduce(vector_shape, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[Int], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, ComptimeInt[VA[idx]]))].static_product] where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known
Reshape a tensor into a vectorized form for efficient SIMD operations.
This method transforms the tensor's logical layout to enable efficient vectorized processing, treating blocks of elements as vector units. The transformation is particularly useful for SIMD (Single Instruction Multiple Data) operations and hardware acceleration.
The vector shape is tracked in element_size.
Example:
For a 16x16 tensor, vectorize[4, 4] will produce a 4x4 tensor
where each element position is the starting point of a 4x4 block
from the original tensor. The strides are scaled by the vector shape
so that adjacent elements in the vectorized tensor are spaced apart
by the vector dimensions.
Performance:
- Creates a view without copying data, making it very efficient.
- Enables strided access patterns suitable for SIMD vector loads.
- Zero-cost abstraction at compile time when used with static shapes.
Constraints:
All dimensions must be statically known (all_dims_known).
Parameters:
- *vector_shape (
Int): The dimensions of each vector unit along each axis of the tensor. For example, in a 2D tensor,vectorize[4, 4]treats 4x4 blocks as vector units.
Returns:
TileTensor: A view of the tensor with a vectorized layout, where each element in
the resulting tensor represents the start of a vector block from the
original tensor. The element layout is tracked via
element_size (the vector shape).
vectorize(self) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].SIMDVectorizedType where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known
Return a SIMD-width vectorized view of this tensor.
This is a convenience method that vectorizes along the last dimension by the SIMD width for the tensor's dtype.
Returns:
TileTensor: A Self.VectorizedType[1, simd_width_of[Self.dtype]()] view whose
last dimension stride equals the SIMD width for the tensor's dtype.
coalesce
coalesce(self) -> TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].CoalescedType where TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].is_row_major if TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known else TileTensor[dtype, LayoutType, origin, address_space=address_space, linear_idx_type=linear_idx_type, element_size=element_size].all_dims_known
Creates a rank-1 tensor by flattening all dimensions.
Coalescing combines all dimensions into a single contiguous dimension. This is useful for operations that need to iterate over all elements sequentially.
Example:
For a 4x4 tensor, coalesce() produces a 16-element rank-1 tensor.
For a vectorized tensor with shape (4, 4) and element shape (4, 4),
coalescing produces shape (16,) with element shape (16,).
Performance:
- Creates a view without copying data.
- Enables simple sequential iteration over all elements.
- Zero-cost abstraction at compile time.
Constraints:
All dimensions must be statically known (all_dims_known).
The tensor must have row-major (contiguous) strides (is_row_major).
Returns:
TileTensor: A rank-1 tensor with shape equal to the product of all original
dimensions and stride 1. Element layout is also coalesced.
make_dynamic
make_dynamic[dyn_dtype: DType](self) -> TileTensor[dtype, Layout[#kgen.variadic.reduce(LayoutType._shape_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, RuntimeInt[dyn_dtype])), #kgen.variadic.reduce(LayoutType._stride_types, base=, reducer=[PrevV: Variadic[CoordLike], VA: Variadic[CoordLike], idx: __mlir_type.index] #kgen.variadic.concat(PrevV, RuntimeInt[dyn_dtype]))], origin, address_space=address_space, linear_idx_type=linear_idx_type]
Convert all elements in shape and stride to RuntimeInt[dyn_dtype].
Examples:
from layout import TileTensor
from layout.tile_layout import row_major
var storage = InlineArray[Float32, 12](uninitialized=True)
var tensor = TileTensor(Span(storage), row_major[3, 4]())
var dynamic = tensor.make_dynamic[DType.int64]()
# dynamic has RuntimeInt[DType.int64] for all shape/stride dimensionsParameters:
- dyn_dtype (
DType): The data type for the resulting RuntimeInt values.
Returns:
TileTensor: A new TileTensor where all elements in shape and stride
are converted to RuntimeInt[dyn_dtype].
to_layout_tensor
to_layout_tensor(self) -> LayoutTensor[dtype, Layout(coord_to_int_tuple[LayoutType._shape_types](), coord_to_int_tuple[LayoutType._stride_types]()), origin, address_space=address_space]
Return a LayoutTensor with the same shape, stride, and address space of this tensor. Currently it expects flat layouts.
This is a utility to help with porting LayoutTensor methods to this type.
Returns:
LayoutTensor: A LayoutTensor with the same shape, stride, and address space of
this tensor.
as_any_origin
as_any_origin(self) -> TileTensor[dtype, LayoutType, AnyOrigin[mut=mut._mlir_value], address_space=address_space, linear_idx_type=linear_idx_type]
Casts the origin of the mutable LayoutTensor to MutAnyOrigin.
This requires the tensor to already be mutable as casting mutability is inherently very unsafe.
It is usually preferred to maintain concrete origin values instead of
using MutAnyOrigin. However, if it is needed, keep in mind that
MutAnyOrigin can alias any memory value, so Mojo's ASAP
destruction will not apply during the lifetime of the tensor.
Returns:
TileTensor: A pointer with the origin set to MutAnyOrigin.
as_immut
as_immut(self) -> TileTensor[dtype, LayoutType, origin_of(origin), address_space=address_space, linear_idx_type=linear_idx_type]
Return an immutable version of this tensor.
Returns:
TileTensor: A LayoutTensor covering the same elements, but without mutability.
address_space_cast
address_space_cast[target_address_space: AddressSpace](self) -> TileTensor[dtype, LayoutType, origin, address_space=target_address_space, linear_idx_type=linear_idx_type]
Return a version of this tensor cast to a new address space.
Parameters:
- target_address_space (
AddressSpace): The target address space to cast to.
Returns:
TileTensor: A TileTensor covering the same elements in the new address space.
to_device_buffer
to_device_buffer(self, ctx: DeviceContext) -> DeviceBuffer[dtype]
Convert the tensor to a DeviceBuffer.
Args:
- ctx (
DeviceContext): The device context to use.
Returns:
DeviceBuffer: A DeviceBuffer containing the tensor's data.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!