IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /get-started.md). For the complete documentation index, see llms.txt.

Skip to main content

For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /get-started.md).

AI terms

Attention

What is attention?

Attention mask

What is an attention mask?

Autoregression

What is autoregression?

Batching

What is batching?

Context encoding

What is context encoding?

Continuous batching

What is continuous batching?

Disaggregated inference

What is disaggregated inference?

Embedding

What is an embedding?

Flash attention

What is flash attention?

Inference routing

What is inference routing?

KV cache

What is KV cache?

Padding tokens

What are padding tokens?

PagedAttention

What is PagedAttention?

Ragged tensors

What are ragged tensors?

Tokenization

What is tokenization?

Transformer

What is a transformer model?

Edit this page

Edit this page

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!