For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Occupancy

In GPU programming, occupancy is a measure of the efficiency of the GPU's compute resources. It is defined as the ratio of the number of active warps to the maximum number of warps that can be active on a given streaming multiprocessor (SM) at any one time.

Higher occupancy can improve parallel execution and hide memory latency, but increasing occupancy does not always boost performance, as factors like memory bandwidth and instruction dependencies may create bottlenecks. The optimal occupancy level depends on the workload and GPU architecture.