Skip to main content

Mojo function

buffer_load

buffer_load[dtype: DType, width: Int, *, cache_policy: CacheOperation = CacheOperation(0)](src_resource: SIMD[uint32, 4], gds_offset: SIMD[int32, 1]) -> SIMD[dtype, width]

Loads data from global memory into a SIMD register with cache operation control.

This function provides a hardware-accelerated global memory load operation that maps directly to the AMDGPU buffer_load instruction. It efficiently transfers data from global memory to registers with high-level cache control.

Note:

  • Only supported on AMD GPUs.
  • Provides high-level cache control via CacheOperation enum values.
  • Supports widths that map to 1, 2, 4, 8, or 16 byte loads.
  • Maps directly to llvm.amdgcn.raw.buffer.load intrinsics.
  • Cache control bits:
    • SC[1:0] controls coherency scope: 0=wave, 1=group, 2=device, 3=system.
    • nt=True: Use streaming-optimized cache policies (recommended for streaming data).

Parameters:

  • dtype (DType): The data type to load.
  • width (Int): The SIMD vector width for vectorized loads.
  • cache_policy (CacheOperation): Cache operation policy controlling cache behavior at all levels.

Args:

  • src_resource (SIMD): Buffer resource descriptor created by make_buffer_resource().
  • gds_offset (SIMD): Offset in elements (not bytes) from the base address in the resource.

Returns:

SIMD: SIMD vector containing the loaded data.

Was this page helpful?