Mojo function
ds_read_tr8_b64
ds_read_tr8_b64[dtype: DType, //](shared_ptr: UnsafePointer[Scalar[dtype], shared_ptr.origin, address_space=AddressSpace.SHARED]) -> SIMD[dtype, 8]
Reads a 64-bit LDS transpose block using TR8 layout and returns SIMD[dtype, 8] of 8-bit types.
Each 16-lane row reads 16x8 bytes from LDS and performs two interleaved 8x8 byte transposes, producing 8 transposed bytes per lane.
Notes:
- Only supported on AMD GPUs (CDNA4+).
- Maps directly to llvm.amdgcn.ds.read.tr8.b64 intrinsic.
- Return type must use v2i32 intermediate to avoid LLVM type legalizer crash.
Parameters:
- dtype (
DType): Data type of the elements (must be 8-bit type).
Args:
- shared_ptr (
UnsafePointer): Pointer to the LDS transpose block.
Returns:
SIMD: SIMD[dtype, 8] of 8-bit types.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!