Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits (2403.07213v1)

Published 11 Mar 2024 in cs.LG and stat.ML

Abstract: Web-based applications such as chatbots, search engines and news recommendations continue to grow in scale and complexity with the recent surge in the adoption of LLMs. Online model selection has thus garnered increasing attention due to the need to choose the best model among a diverse set while balancing task reward and exploration cost. Organizations faces decisions like whether to employ a costly API-based LLM or a locally finetuned small LLM, weighing cost against performance. Traditional selection methods often evaluate every candidate model before choosing one, which are becoming impractical given the rising costs of training and finetuning LLMs. Moreover, it is undesirable to allocate excessive resources towards exploring poor-performing models. While some recent works leverage online bandit algorithm to manage such exploration-exploitation trade-off in model selection, they tend to overlook the increasing-then-converging trend in model performances as the model is iteratively finetuned, leading to less accurate predictions and suboptimal model selections. In this paper, we propose a time-increasing bandit algorithm TI-UCB, which effectively predicts the increase of model performances due to finetuning and efficiently balances exploration and exploitation in model selection. To further capture the converging points of models, we develop a change detection mechanism by comparing consecutive increase predictions. We theoretically prove that our algorithm achieves a logarithmic regret upper bound in a typical increasing bandit setting, which implies a fast convergence rate. The advantage of our method is also empirically validated through extensive experiments on classification model selection and online selection of LLMs. Our results highlight the importance of utilizing increasing-then-converging pattern for more efficient and economic model selection in the deployment of LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. The non-stationary stochastic multi-armed bandit problem. International Journal of Data Science and Analytics 3 (2017), 267–283.
  2. Adaptively tracking the best bandit arm with an unknown number of distribution changes. In Conference on Learning Theory. PMLR, 138–158.
  3. Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in neural information processing systems 27 (2014), 199–207.
  4. Efficient change-point detection for tackling piecewise-stationary bandits. The Journal of Machine Learning Research 23, 1 (2022), 3337–3376.
  5. Djallel Bouneffouf and Raphael Féraud. 2016. Multi-armed bandit problem with known trend. Neurocomputing 205 (2016), 16–21.
  6. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  7. Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 418–427.
  8. Best Model Identification: A Rested Bandit Formulation. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 1362–1372. https://proceedings.mlr.press/v139/cella21a.html
  9. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
  10. Richard Combes and Alexandre Proutiere. 2014. Unimodal bandits: Regret lower bounds and optimal algorithms. In International Conference on Machine Learning. PMLR, 521–529.
  11. Ashok Cutkosky and Kwabena Boahen. 2017. Online learning without prior information. In Conference on learning theory. PMLR, 643–677.
  12. BOHB: Robust and efficient hyperparameter optimization at scale. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80. 1436–1445.
  13. Auto-sklearn 2.0: The next generation. arXiv preprint arXiv:2007.04074 24 (2020).
  14. Efficient and robust automated machine learning. Advances in neural information processing systems 28 (2015).
  15. Parameter-Free Online Learning via Model Selection. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/a2186aa7c086b46ad4e8bf81e2a3a19b-Paper.pdf
  16. Model Selection for Contextual Bandits. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/433371e69eb202f8e7bc8ec2c8d48021-Paper.pdf
  17. Aurélien Garivier and Olivier Cappé. 2011. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. In Proceedings of the 24th Annual Conference on Learning Theory (Proceedings of Machine Learning Research, Vol. 19), Sham M. Kakade and Ulrike von Luxburg (Eds.). PMLR, Budapest, Hungary, 359–376. https://proceedings.mlr.press/v19/garivier11a.html
  18. Aurélien Garivier and Eric Moulines. 2011. On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory. Springer, 174–188.
  19. Tight Policy Regret Bounds for Improving and Decaying Bandits.. In IJCAI. 1562–1570.
  20. Sequential Automated Machine Learning: Bandits-driven Exploration using a Collaborative Filtering Representation. In 8th ICML Workshop on Automated Machine Learning (AutoML). https://openreview.net/forum?id=6tlvEH9HaX
  21. The economic trade-offs of large language models: A case study. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track). Association for Computational Linguistics, Toronto, Canada, 248–267. https://doi.org/10.18653/v1/2023.acl-industry.24
  22. Automated machine learning: methods, systems, challenges. Springer Nature.
  23. Online active model selection for pre-trained classifiers. In International Conference on Artificial Intelligence and Statistics. PMLR, 307–315.
  24. Auto-WEKA: Automatic model selection and hyperparameter optimization in WEKA. Automated machine learning: methods, systems, challenges (2019), 81–95.
  25. Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
  26. Rotting bandits. Advances in neural information processing systems 30 (2017).
  27. Hyperband: A novel bandit-based approach to hyperparameter optimization. In Journal of Machine Learning Research, Vol. 18. 1–52.
  28. Efficient automatic CASH via rising bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4763–4771.
  29. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.
  30. A change-detection based framework for piecewise-stationary multi-armed bandit problem. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
  31. An ADMM based framework for automl pipeline configuration. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4892–4899.
  32. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies. 142–150.
  33. Stochastic Rising Bandits. In International Conference on Machine Learning. PMLR, 15421–15457.
  34. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 1797–1807. https://doi.org/10.18653/v1/D18-1206
  35. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  36. Francesco Orabona. 2014. Simultaneous model selection and optimization through parameter-free stochastic learning. Advances in Neural Information Processing Systems 27 (2014).
  37. Abraham Toluwase Owodunni and Chris Chinenye Emezue. 2023. Koya: A Recommender System for Large Language Model Selection. In 4th Workshop on African Natural Language Processing. https://openreview.net/forum?id=5DGm3lou3z
  38. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813 (2023).
  39. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  40. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
  41. Weighted linear bandits for non-stationary environments. Advances in Neural Information Processing Systems 32 (2019).
  42. Masked Language Model Scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2699–2712.
  43. Rotting bandits are no harder than stochastic ones. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2564–2572.
  44. A single algorithm for both restless and rested rotting bandits. In International Conference on Artificial Intelligence and Statistics. PMLR, 3784–3794.
  45. Cem Tekin and Mingyan Liu. 2012. Online learning of rested and restless bandits. IEEE Transactions on Information Theory 58, 8 (2012), 5588–5611.
  46. Sliding-window thompson sampling for non-stationary settings. Journal of Artificial Intelligence Research 68 (2020), 311–364.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yu Xia (65 papers)
  2. Fang Kong (14 papers)
  3. Tong Yu (119 papers)
  4. Liya Guo (8 papers)
  5. Ryan A. Rossi (124 papers)
  6. Sungchul Kim (65 papers)
  7. Shuai Li (295 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com