Skip to main content
Log in

Mojo module

layout_tensor

Provides the LayoutTensor type for representing multidimensional data.

Aliases

  • binary_op_type = fn[DType, Int](lhs: SIMD[$0, $1], rhs: SIMD[$0, $1]) -> SIMD[$0, $1]: Type alias for binary operations on SIMD vectors. This type represents a function that takes two SIMD vectors of the same type and width and returns a SIMD vector of the same type and width.

    Args: type: The data type of the SIMD vector elements. width: The width of the SIMD vector. lhs: Left-hand side SIMD vector operand. rhs: Right-hand side SIMD vector operand.

    Returns: A SIMD vector containing the result of the binary operation.

Structs

  • LayoutTensor: A high-performance tensor with explicit memory layout and hardware-optimized access patterns.
  • LayoutTensorIter: Iterator for traversing a memory buffer with a specific layout.
  • ThreadScope: Represents the scope of thread operations in GPU programming.

Functions

  • copy: Synchronously copy data from local memory (registers) to SRAM (shared memory).
  • copy_dram_to_local: Efficiently copy data from global memory (DRAM) to registers for AMD GPUs.
  • copy_dram_to_sram: Synchronously copy data from DRAM (global memory) to SRAM (shared memory) in a GPU context.
  • copy_dram_to_sram_async: Asynchronously copy data from DRAM (global memory) to SRAM (shared memory) in a GPU context.
  • copy_local_to_dram: Efficiently copy data from registers (LOCAL) to global memory (DRAM).
  • copy_local_to_local: Synchronously copy data between local memory (register) tensors with type conversion.
  • copy_sram_to_dram: Synchronously copy data from SRAM (shared memory) to DRAM (global memory).
  • copy_sram_to_local: Synchronously copy data from SRAM (shared memory) to local memory.
  • cp_async_k_major: Asynchronously copy data from DRAM to SRAM using TMA (Tensor Memory Accelerator) with K-major layout.
  • cp_async_mn_major: Asynchronously copy data from DRAM to SRAM using TMA (Tensor Memory Accelerator) with MN-major layout.
  • stack_allocation_like: Create a stack-allocated tensor with the same layout as an existing tensor.