IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

TileLoaderCPAsync

struct TileLoaderCPAsync[dtype: DType, src_layout: TensorLayout, thread_layout: Layout[thread_layout.shape_types, thread_layout.stride_types], swizzle_mode: TensorMapSwizzle, vector_size: Int]

Software-based tile loader using cp.async instructions.

This loader uses CUDA's cp.async instructions for asynchronous memory transfers with manual bounds checking and shared memory swizzling for optimal bank conflict avoidance.

Parameters​

Fields​

  • ​src (TileTensor[dtype, src_layout, ImmutAnyOrigin]):

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDeletable, Movable, RegisterPassable, TileLoader, TrivialRegisterPassable

Methods​

__init__​

def __init__(src: TileTensor[dtype, src_layout, ImmutAnyOrigin]) -> Self

Initialize the cp.async tile loader.

Args:

load_tile​

def load_tile(self, dst: TileTensor[address_space=AddressSpace.SHARED, linear_idx_type=dst.linear_idx_type, element_size=dst.element_size], mem_barrier: UnsafePointer[SharedMemBarrier, address_space=AddressSpace.SHARED], coords: Tuple[Int, Int])

Load a tile using cp.async instructions.

Extracts a tile from the source tensor and performs an asynchronous copy to shared memory with bounds checking and swizzling.

Note: Unlike TMA, this method expects tile indices and handles the conversion to element offsets internally via the tile() method.

Args: