Contains information about GPU architectures and their capabilities.
This module provides detailed specifications for various GPU models including NVIDIA and AMD GPUs. It includes information about compute capabilities, memory specifications, thread organization, and performance characteristics.
GPU Target Configuration Guide
When adding support for a new GPU architecture, you must create a target
configuration function that returns a _TargetType. This guide explains the
components of the MLIR target configuration, with special focus on the
data_layout string.
MLIR Target Components
Each GPU target function returns an MLIR kgen.target attribute with these
fields:
- triple: Target triple (e.g., "nvptx64-nvidia-cuda", "amdgcn-amd-amdhsa").
- arch: Architecture name (e.g., "sm_80", "gfx942", "apple-m4").
- features: Target-specific features (e.g., "+ptx81,+sm_80").
- tune_cpu: Optimization target (usually same as arch, can differ for tuning).
- data_layout: LLVM data layout string (explained in detail below).
- index_bit_width: Bit width for index types (usually 64).
- simd_bit_width: SIMD register width (usually 128 for modern GPUs).
Understanding Data Layout Strings
The data_layout string describes memory layout characteristics for the target
architecture. It follows LLVM' data layout specification format: https://llvm.org/docs/LangRef.html#data-layout
and is used by the compiler to make decisions about memory access patterns,
type layouts, and optimizations.
Format Overview
The string consists of specifications separated by dashes (-):
- Endianness:
e(little-endian) orE(big-endian). - Pointers:
p[addr_space]:size:abi:pref:idx. - Integers:
i<size>:<abi>:<pref>. - Floats:
f<size>:<abi>:<pref>. - Vectors:
v<size>:<abi>:<pref>. - Native widths:
n<size>:<size>:.... - Stack alignment:
S<size>. - Address space:
A<number>. - Mangling:
m:<style>(e.g.,m:efor ELF).
Component Details
Endianness
e: Little-endian (all modern GPUs use this).E: Big-endian (rarely used).
Pointer Specifications: p[addr_space]:size:abi:pref:idx
Defines pointer sizes and alignments for different memory spaces:
- Address space: Optional number (0-9) specifying memory type:
porp0: Generic/flat address space.p1: Global memory (AMD) or device memory.p2: Constant memory (AMD).p3: Shared/local memory (NVIDIA) or local memory (AMD).p4: Constant memory (NVIDIA) or generic memory (AMD).p5: Local/private memory (NVIDIA/AMD).p6-p9: Vendor-specific address spaces.
- size: Pointer size in bits.
- abi: ABI-required alignment in bits.
- pref: Preferred alignment in bits (optional).
- idx: Index type size in bits (optional).
Examples:
p3:32:32means shared memory uses 32-bit pointers with 32-bit alignment.p:64:64:64means generic pointers are 64 bits with 64-bit alignment.p7:160:256:256:32means address space 7 uses 160-bit pointers with 256-bit alignment.
Integer Specifications: i<size>:<abi>:<pref>
Defines alignment for integer types:
- size: Integer size in bits (1, 8, 16, 32, 64, 128, 256, etc.).
- abi: Minimum ABI alignment in bits.
- pref: Preferred alignment in bits (optional, defaults to abi).
Examples:
i64:64means 64-bit integers have 64-bit alignment.i128:128means 128-bit integers have 128-bit alignment.i1:8:8means 1-bit booleans are stored in 8-bit aligned bytes.
Float Specifications: f<size>:<abi>:<pref>
Similar to integers but for floating-point types:
Examples:
f32:32:32means 32-bit floats have 32-bit alignment.f64:64:64means 64-bit doubles have 64-bit alignment.
Vector Specifications: v<size>:<abi>:<pref>
Defines alignment for vector types:
- size: Vector size in bits.
- abi: ABI alignment in bits.
- pref: Preferred alignment in bits (optional).
Examples:
v16:16means 16-bit vectors aligned to 16 bits.v128:128:128means 128-bit vectors have 128-bit alignment.
Native Integer Widths: n<size>:<size>:...
Specifies which integer widths are "native" (efficient) for the target. The compiler will prefer these sizes for operations.
Examples:
n16:32:64means 16, 32, and 64-bit operations are efficient.n32:64means 32 and 64-bit operations are efficient.n8:16:32means 8, 16, and 32-bit operations are efficient.
Stack Alignment: S<size>
Specifies natural stack alignment in bits.
Example: S32 means 32-bit stack alignment.
Address Space: A<number>
Specifies the default address space for allocations.
Example: A5 means use address space 5 by default.
Non-Integral Pointers: ni:<space>:<space>:...
Lists address spaces where pointers cannot be cast to integers.
Example: ni:7:8:9 means address spaces 7, 8, and 9 have non-integral pointers.
Vendor-Specific Patterns
NVIDIA GPUs (CUDA/PTX)
Typical data layout for NVIDIA GPUs (sm_60 and later):
e-p3:32:32-p4:32:32-p5:32:32-p6:32:32-p7:32:32-i64:64-i128:128-i256:256-v16:16-v32:32-n16:32:64Breakdown:
e: Little-endian.p3:32:32: Shared memory pointers are 32-bit.p4:32:32: Constant memory pointers are 32-bit.p5:32:32: Local memory pointers are 32-bit.p6:32:32,p7:32:32: NVIDIA-specific address spaces.i64:64,i128:128,i256:256: Integer alignments.v16:16,v32:32: Vector alignments for warp operations.n16:32:64: Native integer widths (16, 32, and 64-bit operations).
Note: NVIDIA GPUs use address-space-specific 32-bit pointers for shared, constant, and local memory, while the default address space (not specified) uses 64-bit pointers. This matches the PTX memory model.
AMD GPUs (ROCm/HIP)
Typical data layout for AMD GPUs (CDNA and RDNA):
e-m:e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128:128:48-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9AMD GPUs use more address spaces and have more complex specifications:
m:e: ELF mangling style.p:64:64: Default pointers are 64-bit (unified addressing).p1:64:64: Global memory uses 64-bit pointers.p2:32:32: Constant memory uses 32-bit pointers.p3:32:32: Local/shared memory uses 32-bit pointers.p4:64:64: Generic address space uses 64-bit pointers.p5:32:32: Private memory uses 32-bit pointers.p7,p8,p9: Complex buffer descriptors (160, 128, 192 bits).- Extensive vector sizes (
v16throughv2048) for wavefront operations. n32:64: Native integer widths.S32: 32-bit stack alignment.A5: Default address space is 5.G1: Global address space is 1.ni:7:8:9: Address spaces 7, 8, 9 have non-integral pointers.
Apple Metal GPUs
Typical data layout for Apple Silicon:
e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32Apple GPUs have unified memory architecture:
p:64:64:64: 64-bit pointers with explicit preferred alignment (unified memory).- Explicit specifications for all integer sizes (
i1,i8,i16,i32,i64). - Explicit float alignments (
f32:32:32,f64:64:64). - Comprehensive vector size coverage (
v16throughv1024). n8:16:32: Native integer widths (8, 16, and 32-bit operations).
How to Obtain Data Layout Strings
When adding support for a new GPU architecture, obtain the data layout string using these methods:
Method 1: Query LLVM/Clang (Recommended)
Use Clang to query the target's default data layout:
For NVIDIA GPUs:
echo 'target triple = "nvptx64-nvidia-cuda"' > test.ll
clang -S test.ll -o - | grep datalayoutFor AMD GPUs:
echo 'target triple = "amdgcn-amd-amdhsa"' > test.ll
clang -S test.ll -o - | grep datalayoutMethod 2: Consult LLVM Source Code
Check the LLVM source for target data layout definitions:
- NVIDIA:
llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp(seecomputeDataLayout()). - AMD:
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp(seegetGPUDataLayout()).
Method 3: Reference Similar GPUs
For GPUs in the same architecture family, the data layout is often identical:
- All NVIDIA Ampere/Ada/Hopper GPUs (sm_80+) use the same data layout.
- AMD CDNA GPUs share similar layouts.
- Apple Metal GPUs have consistent patterns across generations.
When in doubt, use the data layout from a GPU in the same family.
Method 4: Consult Vendor Documentation
Refer to official programming guides and specifications:
- NVIDIA: LLVM NVPTX Usage Guide, CUDA Programming Guide, PTX ISA documentation.
- AMD: ROCm documentation, LLVM AMDGPU documentation.
- Apple: Metal Programming Guide, Metal Shading Language Specification.
The LLVM NVPTX documentation recommends this data layout for 64-bit NVIDIA GPUs:
e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64Note: The data layouts in this file use address-space-specific pointer
specifications (p3, p4, p5, etc.) rather than the generic p:64:64:64. This
provides more precise control over memory access patterns for different memory
spaces in GPU kernels.
Field-by-Field Explanation
Triple
The target triple identifies the architecture, vendor, and operating system:
- NVIDIA:
nvptx64-nvidia-cuda(64-bit) ornvptx-nvidia-cuda(32-bit). - AMD:
amdgcn-amd-amdhsa(HSA runtime). - Apple:
air64-apple-macosx(Metal on macOS).
Arch
The architecture name specifies the GPU generation:
- NVIDIA:
sm_XXwhere XX is the compute capability (e.g.,sm_80for compute 8.0).- Find compute capability at https://developer.nvidia.com/cuda-gpus.
- Format:
sm_XYmaps to compute capabilityX.Y,sm_XYZmaps toXY.Z.
- AMD:
gfxXXXXwhere XXXX is the GFX version (e.g.,gfx942for MI300X).- Find GFX version in ROCm documentation or GPU specifications.
- Apple:
apple-mXwhere X is the chip generation (e.g.,apple-m4).
Features
Target-specific features enabled for code generation:
- NVIDIA:
+ptxXX,+sm_YYwhere XX is PTX version and YY is compute capability.- PTX version should match your CUDA toolkit version (see PTX ISA docs).
- Example:
+ptx85,+sm_90aenables PTX 8.5 and compute 9.0a features. - Q: Is specifying PTX version redundant? A: No, PTX version determines available instructions and features, independent of compute capability.
- AMD: Often empty (
"") as features are implied by architecture. - Apple: Often empty (
"") for Metal GPUs.
Tune CPU
Specifies the optimization target for code generation:
- Usually the same as
arch(e.g.,tune_cpu = "sm_90a"). - Can differ if you want to optimize for a different microarchitecture while
maintaining compatibility (e.g.,
arch = "sm_80",tune_cpu = "sm_90a"). - Some older GPU entries omit this field (see GTX 970, GTX 1080 Ti).
Index Bit Width
The bit width for index types used in address calculations:
- 32-bit systems:
index_bit_width = 32. - 64-bit systems:
index_bit_width = 64. - Most modern GPUs use 64-bit indexing for large memory spaces.
SIMD Bit Width
The width of SIMD registers in bits:
- Modern GPUs: Usually
simd_bit_width = 128(128-bit vector operations). - This represents the native vector width for efficient operations.
- How to find this: Based on warp/wavefront width and register
architecture:
- NVIDIA: 128 bits (4 x 32-bit values per warp operation).
- AMD: 128 bits for CDNA/RDNA architectures.
- Apple: 128 bits for Metal GPUs.
Step-by-Step Guide for Adding a New GPU
Follow these steps to add support for a new GPU architecture:
Step 1: Gather GPU Information
Collect these specifications for your GPU:
- Model name: e.g., "H100", "MI300X", "M4".
- Compute capability (NVIDIA) or GFX version (AMD) or Metal version (Apple).
- Architecture family: Identify the family (e.g., Hopper, CDNA3, Apple M series).
- SM/CU count: Number of streaming multiprocessors or compute units.
- Target triple: Standard LLVM triple for the vendor.
- Data layout string: Obtain using methods described above.
To find SM count for NVIDIA GPUs, use this CUDA code:
void printMultiProcessorCount() {
int dev = 0;
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);
printf("Number of SMs: %d\n", deviceProp.multiProcessorCount);
}Or check vendor specifications:
- NVIDIA: https://developer.nvidia.com/cuda-gpus.
- AMD: ROCm device specifications.
Step 2: Create the Target Function
Add a new function that returns the MLIR target configuration.
Example for NVIDIA GPU:
fn _get_your_gpu_target() -> _TargetType:
"""Creates an MLIR target configuration for Your GPU.
Returns:
MLIR target configuration for Your GPU.
"""
return __mlir_attr[
`#kgen.target<triple = "nvptx64-nvidia-cuda", `,
`arch = "sm_90a", `,
`features = "+ptx85,+sm_90a", `,
`tune_cpu = "sm_90a", `,
`data_layout = "e-p3:32:32-p4:32:32-p5:32:32-p6:32:32-p7:32:32-i64:64-i128:128-i256:256-v16:16-v32:32-n16:32:64",`,
`index_bit_width = 64,`,
`simd_bit_width = 128`,
`> : !kgen.target`,
]Place this function with other GPU target functions in this file (search for
_get_*_target() functions).
Step 3: Create the GPUInfo Alias
Define the GPU characteristics using the appropriate architecture family:
alias YourGPU = GPUInfo.from_family(
family=NvidiaHopperFamily, # Choose the appropriate family
name="Your GPU",
vendor=Vendor.NVIDIA_GPU,
api="cuda",
arch_name="hopper",
compute=9.0, # Must match arch (9.0 -> sm_90, 12.1 -> sm_121)
version="sm_90a",
sm_count=132, # Number of streaming multiprocessors
)Place this alias with other GPU aliases in this file.
Step 4: Update _get_info_from_target
Add your architecture to the constraint list in the _get_info_from_target
function:
constrained[
StaticString(target_arch)
in (
# NVIDIA
StaticString("cuda"),
StaticString("52"),
StaticString("90a"), # Add your architecture here
# ... rest of architectures ...
),
"the target architecture '",
target_arch0,
"' is invalid or not currently supported",
]()Then add the mapping in the @parameter block:
@parameter
if target_arch == "52":
return materialize[GTX970]()
elif target_arch == "90a": # Add your mapping here
return materialize[YourGPU]()
# ... rest of mappings ...Note: The target_arch has the "sm_" prefix stripped, so "sm_90a" becomes
"90a".
Note: GPUs are currently 1:1 with the target_arch string. This is going to be
changed to support multiple GPUs per target_arch in the future.
Step 5: Update GPUInfo.target Method
Add the target mapping in the target() method of the GPUInfo struct:
fn target(self) -> _TargetType:
"""Gets the MLIR target configuration for this GPU.
Returns:
MLIR target configuration for the GPU.
"""
if self.name == "NVIDIA Tesla P100":
return _get_teslap100_target()
if self.name == "Your GPU": # Add your GPU here
return _get_your_gpu_target()
# ... rest of mappings ...Step 6: Build and Test
Build the standard library to verify your changes:
./bazelw build //mojo/stdlib/stdlibTest with a simple GPU program:
MODULAR_MOJO_MAX_IMPORT_PATH=bazel-bin/mojo/stdlib/stdlib mojo your_test.mojoRun existing GPU tests to ensure nothing broke:
./bazelw test //mojo/stdlib/test/gpu/...Common Pitfalls
Avoid these common mistakes when adding GPU support:
- Mismatched compute capability: Ensure
computematchesarch(e.g.,compute=9.0witharch="sm_90a"). - Incorrect pointer sizes: Verify address space pointer sizes match hardware capabilities.
- Missing vector alignments: Include all vector sizes your kernels will use.
- Wrong endianness: All modern GPUs are little-endian (use
e). - Inconsistent with LLVM: Data layout must match LLVM's target definition.
- Copy-paste errors: Double-check field values when adapting from similar GPUs.
- Forgetting to update all 5 locations: Target function, alias, constraint list, parameter block, and target() method.
- PTX/driver version mismatch: Ensure PTX version is supported by your CUDA driver.
Validation Checklist
Before submitting your GPU addition:
- Target function created and documented.
- GPUInfo alias defined with correct family.
- Architecture added to constraint list in
_get_info_from_target. - Mapping added to
@parameterblock in_get_info_from_target. - Mapping added to
GPUInfo.target()method. - Data layout string validated against LLVM documentation.
- Compute capability matches architecture name.
- SM/CU count verified against official specifications.
- Standard library builds successfully.
- Existing tests pass.
- Manual testing with simple GPU kernel.
Related Files
- sys/info.mojo: Defines
_TargetTypeas!kgen.targetandCompilationTargetstruct. - LLVM Documentation: https://llvm.org/docs/LangRef.html#data-layout (complete data layout specification).
- LLVM NVPTX Usage: https://llvm.org/docs/NVPTXUsage.html (NVIDIA-specific guidance).
Examples in This File
See real-world examples by searching for these functions:
_get_h100_target(): NVIDIA Hopper H100 (compute 9.0)._get_mi300x_target(): AMD CDNA3 MI300X._get_metal_m4_target(): Apple Metal M4._get_rtx5090_target(): NVIDIA Blackwell consumer GPU.
Each example demonstrates the complete target configuration for that GPU family.
Aliases
A10
alias A10 = GPUInfo.from_family(NvidiaAmpereWorkstationFamily, "A10", Vendor.NVIDIA_GPU, "cuda", "ampere", 8.5999999999999996, "sm_86", 72)
A100
alias A100 = GPUInfo.from_family(NvidiaAmpereDatacenterFamily, "A100", Vendor.NVIDIA_GPU, "cuda", "ampere", 8, "sm_80", 108)
AMDCDNA3Family
alias AMDCDNA3Family = AcceleratorArchitectureFamily(64, 2048, 65536, 65536, 1024)
AMDCDNA4Family
alias AMDCDNA4Family = AcceleratorArchitectureFamily(64, 2048, 163840, 65536, 1024)
AMDRDNAFamily
alias AMDRDNAFamily = AcceleratorArchitectureFamily(32, 1024, 32768, 32768, 1024)
AppleMetalFamily
alias AppleMetalFamily = AcceleratorArchitectureFamily(32, 1024, 32768, 65536, 1024)
B100
alias B100 = GPUInfo.from_family(NvidiaBlackwellFamily, "B100", Vendor.NVIDIA_GPU, "cuda", "blackwell", 10, "sm_100a", 132)
B200
alias B200 = GPUInfo.from_family(NvidiaBlackwellFamily, "B200", Vendor.NVIDIA_GPU, "cuda", "blackwell", 10, "sm_100a", 148)
DGXSpark
alias DGXSpark = GPUInfo.from_family(NvidiaBlackwellFamily, "DGX Spark", Vendor.NVIDIA_GPU, "cuda", "blackwell", 12.1, "sm_121", 48)
GTX1060
alias GTX1060 = GPUInfo.from_family(NvidiaPascalFamily, "NVIDIA GeForce GTX 1060", Vendor.NVIDIA_GPU, "cuda", "pascal", 6.0999999999999996, "sm_61", 10)
GTX1080Ti
alias GTX1080Ti = GPUInfo.from_family(NvidiaPascalFamily, "NVIDIA GeForce GTX 1080 Ti", Vendor.NVIDIA_GPU, "cuda", "pascal", 6.0999999999999996, "sm_61", 28)
GTX970
alias GTX970 = GPUInfo.from_family(NvidiaMaxwellFamily, "NVIDIA GeForce GTX 970", Vendor.NVIDIA_GPU, "cuda", "maxwell", 5.2000000000000002, "sm_52", 13)
H100
alias H100 = GPUInfo.from_family(NvidiaHopperFamily, "H100", Vendor.NVIDIA_GPU, "cuda", "hopper", 9, "sm_90a", 132)
JetsonThor
alias JetsonThor = GPUInfo.from_family(NvidiaBlackwellFamily, "Jetson Thor", Vendor.NVIDIA_GPU, "cuda", "blackwell", 11, "sm_110", 20)
L4
alias L4 = GPUInfo.from_family(NvidiaAdaFamily, "L4", Vendor.NVIDIA_GPU, "cuda", "ada", 8.9000000000000004, "sm_89", 58)
MetalM1
alias MetalM1 = GPUInfo.from_family(AppleMetalFamily, "M1", Vendor.APPLE_GPU, "metal", "apple-m1", 3, "metal_3", 8)
MetalM2
alias MetalM2 = GPUInfo.from_family(AppleMetalFamily, "M2", Vendor.APPLE_GPU, "metal", "apple-m2", 3, "metal_3", 10)
MetalM3
alias MetalM3 = GPUInfo.from_family(AppleMetalFamily, "M3", Vendor.APPLE_GPU, "metal", "apple-m3", 3, "metal_3", 10)
MetalM4
alias MetalM4 = GPUInfo.from_family(AppleMetalFamily, "M4", Vendor.APPLE_GPU, "metal", "apple-m4", 4, "metal_4", 10)
MI300X
alias MI300X = GPUInfo.from_family(AMDCDNA3Family, "MI300X", Vendor.AMD_GPU, "hip", "gfx942", 9.4000000000000003, "CDNA3", 304)
MI355X
alias MI355X = GPUInfo.from_family(AMDCDNA4Family, "MI355X", Vendor.AMD_GPU, "hip", "gfx950", 9.5, "CDNA4", 256)
NoGPU
alias NoGPU = GPUInfo("NoGPU", Vendor.NO_GPU, "none", "no_gpu", 0, "", 0, 0, 0, 0, 0, 0)
NvidiaAdaFamily
alias NvidiaAdaFamily = AcceleratorArchitectureFamily(32, 1536, 102400, 65536, 1024)
NvidiaAmpereDatacenterFamily
alias NvidiaAmpereDatacenterFamily = AcceleratorArchitectureFamily(32, 2048, 167936, 65536, 1024)
NvidiaAmpereEmbeddedFamily
alias NvidiaAmpereEmbeddedFamily = AcceleratorArchitectureFamily(32, 1536, 167936, 65536, 1024)
NvidiaAmpereWorkstationFamily
alias NvidiaAmpereWorkstationFamily = AcceleratorArchitectureFamily(32, 1536, 102400, 65536, 1024)
NvidiaBlackwellConsumerFamily
alias NvidiaBlackwellConsumerFamily = AcceleratorArchitectureFamily(32, 1536, 102400, 65536, 1024)
NvidiaBlackwellFamily
alias NvidiaBlackwellFamily = AcceleratorArchitectureFamily(32, 2048, 233472, 65536, 1024)
NvidiaHopperFamily
alias NvidiaHopperFamily = AcceleratorArchitectureFamily(32, 2048, 233472, 65536, 1024)
NvidiaMaxwellFamily
alias NvidiaMaxwellFamily = AcceleratorArchitectureFamily(32, 2048, 98304, 65536, 1024)
NvidiaPascalFamily
alias NvidiaPascalFamily = AcceleratorArchitectureFamily(32, 2048, 65536, 65536, 1024)
NvidiaTuringFamily
alias NvidiaTuringFamily = AcceleratorArchitectureFamily(32, 2048, 65536, 32768, 1024)
OrinNano
alias OrinNano = GPUInfo.from_family(NvidiaAmpereEmbeddedFamily, "Orin Nano", Vendor.NVIDIA_GPU, "cuda", "ampere", 8.6999999999999993, "sm_87", 8)
Radeon6900
alias Radeon6900 = GPUInfo.from_family(AMDRDNAFamily, "Radeon 6900", Vendor.AMD_GPU, "hip", "gfx1102", 10.300000000000001, "RDNA2", 60)
Radeon7600
alias Radeon7600 = GPUInfo.from_family(AMDRDNAFamily, "Radeon 7600", Vendor.AMD_GPU, "hip", "gfx1102", 11, "RDNA3", 32)
Radeon7800
alias Radeon7800 = GPUInfo.from_family(AMDRDNAFamily, "Radeon 7800/7700", Vendor.AMD_GPU, "hip", "gfx1101", 11, "RDNA3", 60)
Radeon780m
alias Radeon780m = GPUInfo.from_family(AMDRDNAFamily, "Radeon 780M", Vendor.AMD_GPU, "hip", "gfx1103", 11, "RDNA3", 12)
Radeon7900
alias Radeon7900 = GPUInfo.from_family(AMDRDNAFamily, "Radeon 7900", Vendor.AMD_GPU, "hip", "gfx1100", 11, "RDNA3", 96)
Radeon8060s
alias Radeon8060s = GPUInfo.from_family(AMDRDNAFamily, "Radeon 8060S", Vendor.AMD_GPU, "hip", "gfx1151", 11.5, "RDNA3.5", 40)
Radeon860m
alias Radeon860m = GPUInfo.from_family(AMDRDNAFamily, "Radeon 860M", Vendor.AMD_GPU, "hip", "gfx1152", 11.5, "RDNA3.5", 8)
Radeon880m
alias Radeon880m = GPUInfo.from_family(AMDRDNAFamily, "Radeon 880M", Vendor.AMD_GPU, "hip", "gfx1150", 11.5, "RDNA3.5", 12)
Radeon9060
alias Radeon9060 = GPUInfo.from_family(AMDRDNAFamily, "Radeon 9060", Vendor.AMD_GPU, "hip", "gfx1200", 12, "RDNA4", 32)
Radeon9070
alias Radeon9070 = GPUInfo.from_family(AMDRDNAFamily, "Radeon 9070", Vendor.AMD_GPU, "hip", "gfx1201", 12, "RDNA4", 64)
RTX2060
alias RTX2060 = GPUInfo.from_family(NvidiaTuringFamily, "RTX2060", Vendor.NVIDIA_GPU, "cuda", "turing", 7.5, "sm_75", 30)
RTX3090
alias RTX3090 = GPUInfo.from_family(NvidiaAmpereWorkstationFamily, "NVIDIA GeForce RTX 3090", Vendor.NVIDIA_GPU, "cuda", "ampere", 8.5999999999999996, "sm_86", 82)
RTX4090
alias RTX4090 = GPUInfo.from_family(NvidiaAdaFamily, "RTX4090", Vendor.NVIDIA_GPU, "cuda", "ada lovelace", 8.9000000000000004, "sm_89", 128)
RTX4090m
alias RTX4090m = GPUInfo.from_family(NvidiaAdaFamily, "RTX4090m", Vendor.NVIDIA_GPU, "cuda", "ada lovelace", 8.9000000000000004, "sm_89", 76)
RTX5090
alias RTX5090 = GPUInfo.from_family(NvidiaBlackwellConsumerFamily, "RTX5090", Vendor.NVIDIA_GPU, "cuda", "blackwell", 12, "sm_120a", 170)
TeslaP100
alias TeslaP100 = GPUInfo.from_family(NvidiaPascalFamily, "NVIDIA Tesla P100", Vendor.NVIDIA_GPU, "cuda", "pascal", 6, "sm_60", 56)
Structs
-
AcceleratorArchitectureFamily: Defines common defaults for a GPU architecture family. -
GPUInfo: Comprehensive information about a GPU architecture. -
Vendor: Represents GPU vendors.
Functions
-
is_cpu: Checks if the target is a CPU (compile-time version). -
is_gpu: Checks if the target is a GPU (compile-time version). -
is_valid_target: Checks if the target is valid (compile-time version).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!