Mojo function
buffer_load
buffer_load[dtype: DType, width: Int, *, cache_policy: CacheOperation = CacheOperation(0)](src_resource: SIMD[uint32, 4], gds_offset: SIMD[int32, 1]) -> SIMD[dtype, width]
Loads data from global memory into a SIMD register with cache operation control.
This function provides a hardware-accelerated global memory load operation that maps directly to the AMDGPU buffer_load instruction. It efficiently transfers data from global memory to registers with high-level cache control.
Note:
- Only supported on AMD GPUs.
- Provides high-level cache control via CacheOperation enum values.
- Supports widths that map to 1, 2, 4, 8, or 16 byte loads.
- Maps directly to llvm.amdgcn.raw.buffer.load intrinsics.
- Cache control bits:
- SC[1:0] controls coherency scope: 0=wave, 1=group, 2=device, 3=system.
- nt=True: Use streaming-optimized cache policies (recommended for streaming data).
Parameters:
- dtype (
DType
): The data type to load. - width (
Int
): The SIMD vector width for vectorized loads. - cache_policy (
CacheOperation
): Cache operation policy controlling cache behavior at all levels.
Args:
- src_resource (
SIMD
): Buffer resource descriptor created by make_buffer_resource(). - gds_offset (
SIMD
): Offset in elements (not bytes) from the base address in the resource.
Returns:
SIMD
: SIMD vector containing the loaded data.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!