Mojo struct
AMDBufferResource
@register_passable(trivial)
struct AMDBufferResource
128-bit descriptor for a buffer resource on AMD GPUs.
Used for buffer_load/buffer_store instructions.
Fields
- desc (
SIMD[DType.uint32, 4]): The 128-bit buffer descriptor encoded as four 32-bit values.
Implemented traits
AnyType,
Copyable,
ImplicitlyCopyable,
Movable,
UnknownDestructibility
Aliases
__copyinit__is_trivial
alias __copyinit__is_trivial = True
__del__is_trivial
alias __del__is_trivial = True
__moveinit__is_trivial
alias __moveinit__is_trivial = True
Methods
__init__
__init__[dtype: DType](gds_ptr: UnsafePointer[Scalar[dtype], address_space=address_space, mut=mut, origin=origin], num_records: Int = Int.__init__[UInt32](SIMD[DType.uint32, 1](max_or_inf[DType.uint32]()))) -> Self
Constructs an AMD buffer resource descriptor.
Parameters:
- dtype (
DType): Data type of the buffer elements.
Args:
- gds_ptr (
UnsafePointer): Pointer to the buffer in global memory. - num_records (
Int): Number of records in the buffer.
__init__() -> Self
Constructs a zeroed AMD buffer resource descriptor.
get_base_ptr
get_base_ptr(self) -> Int
Gets the base pointer address from the buffer resource descriptor.
Returns:
Int: The base pointer address as an integer.
load
load[dtype: DType, width: Int, *, cache_policy: CacheOperation = CacheOperation(0)](self, vector_offset: Int32, *, scalar_offset: Int32 = 0) -> SIMD[dtype, width]
Loads data from the buffer using AMD buffer load intrinsic.
Parameters:
- dtype (
DType): Data type to load. - width (
Int): Number of elements to load. - cache_policy (
CacheOperation): Cache operation policy.
Args:
- vector_offset (
Int32): Offset in elements from the base pointer. - scalar_offset (
Int32): Additional scalar offset in elements.
Returns:
SIMD: SIMD vector containing the loaded data.
load_to_lds
load_to_lds[dtype: DType, *, width: Int = 1, cache_policy: CacheOperation = CacheOperation(0)](self, vector_offset: Int32, shared_ptr: UnsafePointer[Scalar[dtype], address_space=AddressSpace(3)], *, scalar_offset: Int32 = 0)
Loads data from global memory and stores to shared memory.
Copies from global memory to shared memory (aka LDS) bypassing storing to register.
Parameters:
- dtype (
DType): The dtype of the data to be loaded. - width (
Int): The SIMD vector width. - cache_policy (
CacheOperation): Cache operation policy controlling cache behavior at all levels.
Args:
- vector_offset (
Int32): Vector memory offset in elements (per thread). - shared_ptr (
UnsafePointer): Shared memory address. - scalar_offset (
Int32): Scalar memory offset in elements (shared across wave).
store
store[dtype: DType, width: Int, *, cache_policy: CacheOperation = CacheOperation(0)](self, vector_offset: Int32, val: SIMD[dtype, width], *, scalar_offset: Int32 = 0)
Stores a register variable to global memory with cache operation control.
Writes to global memory from a register with high-level cache control.
Note:
- Only supported on AMD GPUs.
- Provides high-level cache control via CacheOperation enum values.
- Maps directly to llvm.amdgcn.raw.buffer.store intrinsics.
- Cache control bits:
- SC[1:0] controls coherency scope: 0=wave, 1=group, 2=device, 3=system.
- nt=True: Use streaming-optimized cache policies (recommended for streaming data).
Parameters:
- dtype (
DType): The data type. - width (
Int): The SIMD vector width. - cache_policy (
CacheOperation): Cache operation policy controlling cache behavior at all levels.
Args:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!