For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Glossary
Explanations for some terms and concepts you'll encounter in the Modular docs.
GPU terms
Block index
What is a GPU block index?
Grid
What is a GPU grid?
Kernel
What is a GPU kernel?
GPU memory
What is GPU memory?
Occupancy
What is GPU occupancy?
Register
What is a GPU register?
Streaming multiprocessor
What is a streaming multiprocessor (SM)?
Thread
What is a GPU thread?
Thread block
What is a GPU thread block?
Thread index
What is a GPU thread index?
Warp
What is a GPU warp?
AI terms
Attention
What is attention?
Attention mask
What is an attention mask?
Autoregression
What is autoregression?
Batching
What is batching?
Context encoding
What is context encoding?
Continuous batching
What is continuous batching?
Disaggregated inference
What is disaggregated inference?
Embedding
What is an embedding?
Flash attention
What is flash attention?
Inference routing
What is inference routing?
KV cache
What is KV cache?
Padding tokens
What are padding tokens?
PagedAttention
What is PagedAttention?
Ragged tensors
What are ragged tensors?
Tokenization
What is tokenization?
Transformer
What is a transformer model?
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!