v25.1.1 (2025-02-19)
Fix performance issues in autoregressive models with paged attention
by setting sensible default values for --max-num-steps that are
platform-specific.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!