Skip to main content

v25.1.1 (2025-02-19)

Fix performance issues in autoregressive models with paged attention by setting sensible default values for --max-num-steps that are platform-specific.

Was this page helpful?