Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OptLLM: Optimal Assignment of Queries to Large Language Models (2405.15130v1)

Published 24 May 2024 in cs.SE, cs.CL, and cs.LG

Abstract: LLMs have garnered considerable attention owing to their remarkable capabilities, leading to an increasing number of companies offering LLMs as services. Different LLMs achieve different performance at different costs. A challenge for users lies in choosing the LLMs that best fit their needs, balancing cost and performance. In this paper, we propose a framework for addressing the cost-effective query allocation problem for LLMs. Given a set of input queries and candidate LLMs, our framework, named OptLLM, provides users with a range of optimal solutions to choose from, aligning with their budget constraints and performance preferences, including options for maximizing accuracy and minimizing cost. OptLLM predicts the performance of candidate LLMs on each query using a multi-label classification model with uncertainty estimation and then iteratively generates a set of non-dominated solutions by destructing and reconstructing the current solution. To evaluate the effectiveness of OptLLM, we conduct extensive experiments on various types of tasks, including text classification, question answering, sentiment analysis, reasoning, and log parsing. Our experimental results demonstrate that OptLLM substantially reduces costs by 2.40% to 49.18% while achieving the same accuracy as the best LLM. Compared to other multi-objective optimization algorithms, OptLLM improves accuracy by 2.94% to 69.05% at the same cost or saves costs by 8.79% and 95.87% while maintaining the highest attainable accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. E. Kasneci, K. Seßler, S. Küchemann, M. Bannert, D. Dementieva, F. Fischer, U. Gasser, G. Groh, S. Günnemann, E. Hüllermeier et al., “Chatgpt for good? on opportunities and challenges of large language models for education,” Learning and individual differences, vol. 103, p. 102274, 2023.
  2. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, 2022.
  3. S. M. Xie, A. Raghunathan, P. Liang, and T. Ma, “An explanation of in-context learning as implicit bayesian inference,” arXiv preprint arXiv:2111.02080, 2021.
  4. S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer, “Rethinking the role of demonstrations: What makes in-context learning work?” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022, pp. 11 048–11 064.
  5. J. Liu, C. S. Xia, Y. Wang, and L. Zhang, “Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation,” arXiv preprint arXiv:2305.01210, 2023.
  6. C. S. Xia, Y. Wei, and L. Zhang, “Automated program repair in the era of large pre-trained language models,” in Proceedings of the 45th International Conference on Software Engineering (ICSE), 2023.
  7. C. S. Xia and L. Zhang, “Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using chatgpt,” arXiv preprint arXiv:2304.00385, 2023.
  8. M. Šakota, M. Peyrard, and R. West, “Fly-swat or cannon? cost-effective language model choice via meta-modeling,” arXiv preprint arXiv:2308.06077, 2023.
  9. (2023) How much does it cost to use gpt models? gpt-3 pricing explained. [Online]. Available: https://neoteric.eu/blog/how-much-does-it-cost-to-use-gpt-models-gpt-3-pricing-explained/
  10. L. Chen, M. Zaharia, and J. Zou, “Frugalgpt: How to use large language models while reducing cost and improving performance,” arXiv preprint arXiv:2305.05176, 2023.
  11. A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, D. S. Chaplot, D. d. l. Casas, E. B. Hanna, F. Bressand et al., “Mixtral of experts,” arXiv preprint arXiv:2401.04088, 2024.
  12. H. Wu, M. Wu, W. Peng, S. Chen, and Z. Feng, “Its: Improved tabu search algorithm for path planning in uav-assisted edge computing systems,” in Proceedings of the 2023 IEEE International Conference on Web Services (ICWS).   IEEE, 2023, pp. 340–349.
  13. F. U. Haq, D. Shin, and L. Briand, “Efficient online testing for dnn-enabled systems using surrogate-assisted and many-objective optimization,” in Proceedings of the 44th international conference on software engineering (ICSE), 2022, pp. 811–822.
  14. R. Cheng, Y. Jin, M. Olhofer et al., “Test problems for large-scale multiobjective and many-objective optimization,” IEEE transactions on cybernetics, vol. 47, no. 12, pp. 4108–4121, 2016.
  15. C. He, R. Cheng, and D. Yazdani, “Adaptive offspring generation for evolutionary large-scale multiobjective optimization,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 2, pp. 786–798, 2020.
  16. OpenAI, “Gpt-4 technical report,” 2023.
  17. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
  18. V.-H. Le and H. Zhang, “Log parsing: How far can chatgpt go?” in Proceedings of the 38th (2023) IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE, 2023.
  19. K. Deb, K. Sindhya, and J. Hakanen, “Multi-objective optimization,” in Decision sciences.   CRC Press, 2016, pp. 161–200.
  20. A. Ramirez, J. R. Romero, and S. Ventura, “A survey of many-objective optimisation in search-based software engineering,” Journal of Systems and Software, vol. 149, pp. 382–395, 2019.
  21. M. Cheikh, B. Jarboui, T. Loukil, and P. Siarry, “A method for selecting pareto optimal solutions in multiobjective optimization,” Journal of Informatics and mathematical sciences, vol. 2, no. 1, pp. 51–62, 2010.
  22. A. Konak, D. W. Coit, and A. E. Smith, “Multi-objective optimization using genetic algorithms: A tutorial,” Reliability engineering & system safety, vol. 91, no. 9, pp. 992–1007, 2006.
  23. X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” Advances in neural information processing systems, vol. 28, 2015.
  24. S. Reddy, D. Chen, and C. D. Manning, “Coqa: A conversational question answering challenge,” Transactions of the Association for Computational Linguistics, vol. 7, pp. 249–266, 2019.
  25. A. Sinha and T. Khandait, “Impact of news on the commodity market: Dataset and results,” in Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 2.   Springer, 2021, pp. 589–601.
  26. J. Welbl, N. F. Liu, and M. Gardner, “Crowdsourcing multiple choice science questions,” arXiv preprint arXiv:1707.06209, 2017.
  27. J. Zhu, S. He, J. Liu, P. He, Q. Xie, Z. Zheng, and M. R. Lyu, “Tools and benchmarks for automated log parsing,” in Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).   IEEE, 2019, pp. 121–130.
  28. Z. A. Khan, D. Shin, D. Bianculli, and L. Briand, “Guidelines for assessing the accuracy of log message template identification techniques,” in Proceedings of the 44th International Conference on Software Engineering (ICSE), 2022, pp. 1095–1106.
  29. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: Nsga-ii,” IEEE transactions on evolutionary computation, vol. 6, no. 2, pp. 182–197, 2002.
  30. C. C. Coello and M. S. Lechuga, “Mopso: A proposal for multiple objective particle swarm optimization,” in Proceedings of the 2002 Congress on Evolutionary Computation (CEC), vol. 2.   IEEE, 2002, pp. 1051–1056.
  31. Q. Zhang and H. Li, “Moea/d: A multiobjective evolutionary algorithm based on decomposition,” IEEE Transactions on evolutionary computation, vol. 11, no. 6, pp. 712–731, 2007.
  32. J. Chen, V. Nair, and T. Menzies, “Beyond evolutionary algorithms for search-based software engineering,” Information and Software Technology, vol. 95, pp. 281–294, 2018.
  33. K. Deb and J. Sundar, “Reference point based multi-objective optimization using evolutionary algorithms,” in Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO), 2006, pp. 635–642.
  34. N. Beume, B. Naujoks, and M. Emmerich, “Sms-emoa: Multiobjective selection based on dominated hypervolume,” European Journal of Operational Research, vol. 181, no. 3, pp. 1653–1669, 2007.
  35. Z. Wang, Y.-S. Ong, and H. Ishibuchi, “On scalable multiobjective test problems with hardly dominated boundaries,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 2, pp. 217–231, 2018.
  36. C. Audet, J. Bigeon, D. Cartier, S. Le Digabel, and L. Salomon, “Performance indicators in multiobjective optimization,” European journal of operational research, vol. 292, no. 2, pp. 397–422, 2021.
  37. J. Blank and K. Deb, “pymoo: Multi-objective optimization in python,” IEEE Access, vol. 8, pp. 89 497–89 509, 2020.
  38. T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2019, pp. 2623–2631.
  39. Z. Jiang, J. Liu, Z. Chen, Y. Li, J. Huang, Y. Huo, P. He, J. Gu, and M. R. Lyu, “Llmparser: A llm-based log parsing framework,” arXiv preprint arXiv:2310.01796, 2023.
  40. X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engineering: A systematic literature review,” arXiv preprint arXiv:2308.10620, 2023.
  41. D. Sobania, M. Briesch, C. Hanna, and J. Petke, “An analysis of the automatic bug fixing performance of chatgpt,” arXiv preprint arXiv:2301.08653, 2023.
  42. S. Zong, J. Seltzer, K. Cheng, J. Lin et al., “Which model shall i choose? cost/quality trade-offs for text classification tasks,” arXiv preprint arXiv:2301.07006, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yueyue Liu (2 papers)
  2. Hongyu Zhang (147 papers)
  3. Yuantian Miao (6 papers)
  4. Van-Hoang Le (19 papers)
  5. Zhiqiang Li (81 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com