For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /get-started.md).

Glossary

Explanations for some terms and concepts you'll encounter in the Modular docs.

GPU terms

Block index

What is a GPU block index?

Grid

What is a GPU grid?

Kernel

What is a GPU kernel?

GPU memory

What is GPU memory?

Occupancy

What is GPU occupancy?

What is a GPU register?

Streaming multiprocessor

What is a streaming multiprocessor (SM)?

Thread

What is a GPU thread?

Thread block

What is a GPU thread block?

Thread index

What is a GPU thread index?

Warp

What is a GPU warp?

AI terms

Attention

What is attention?

Attention mask

What is an attention mask?

Autoregression

What is autoregression?

Batching

What is batching?

Context encoding

What is context encoding?

Continuous batching

What is continuous batching?

Disaggregated inference

What is disaggregated inference?

Embedding

What is an embedding?

Flash attention

What is flash attention?

Inference routing

What is inference routing?

KV cache

What is KV cache?

Padding tokens

What are padding tokens?

PagedAttention

What is PagedAttention?

Ragged tensors

What are ragged tensors?

Tokenization

What is tokenization?

Transformer

What is a transformer model?

GPU terms​

AI terms​

GPU terms

AI terms