Mojo module

layout_tensor

Provides the LayoutTensor type for representing multidimensional data.

`comptime` values

`binary_op_type`

comptime binary_op_type = fn[dtype: DType, width: Int](lhs: SIMD[dtype, width], rhs: SIMD[dtype, width]) -> SIMD[dtype, width]

Type alias for binary operations on SIMD vectors.

This type represents a function that takes two SIMD vectors of the same type and width and returns a SIMD vector of the same type and width.

Args: dtype: The data type of the SIMD vector elements. width: The width of the SIMD vector. lhs: Left-hand side SIMD vector operand. rhs: Right-hand side SIMD vector operand.

Returns: A SIMD vector containing the result of the binary operation.

`OpaquePointer`

comptime OpaquePointer = LegacyUnsafePointer[NoneType]

Legacy OpaquePointer migration helper.

`UnsafePointer`

comptime UnsafePointer = LegacyUnsafePointer[?, address_space=?, origin=?]

Legacy OpaquePointer migration helper.

Structs

LayoutTensor: A high-performance tensor with explicit memory layout and hardware-optimized access patterns.
LayoutTensorIter: Iterator for traversing a memory buffer with a specific layout.
ThreadScope: Represents the scope of thread operations in GPU programming.

Functions

copy_dram_to_local: Efficiently copy data from global memory (DRAM) to registers for AMD GPUs.
copy_dram_to_sram: Synchronously copy data from DRAM (global memory) to SRAM (shared memory) in a GPU context.
copy_dram_to_sram_async: Asynchronously copy data from DRAM (global memory) to SRAM (shared memory) in a GPU context.
copy_local_to_dram: Efficiently copy data from registers (LOCAL) to global memory (DRAM).
copy_local_to_local: Synchronously copy data between local memory (register) tensors with type conversion.
copy_local_to_shared: Synchronously copy data from local memory (registers) to SRAM (shared memory).
copy_sram_to_dram: Synchronously copy data from SRAM (shared memory) to DRAM (global memory).
copy_sram_to_local: Synchronously copy data from SRAM (shared memory) to local memory.
cp_async_k_major: Asynchronously copy data from DRAM to SRAM using TMA (Tensor Memory Accelerator) with K-major layout.
stack_allocation_like: Create a stack-allocated tensor with the same layout as an existing tensor.

comptime values​

binary_op_type​

OpaquePointer​

UnsafePointer​

Structs​

Functions​