Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing (2405.00467v1)

Published 1 May 2024 in cs.CL

Abstract: With the rapid development of LLMs, it is natural to ask how to harness their capabilities efficiently. In this paper, we explore whether it is feasible to direct each input query to a single most suitable LLM. To this end, we propose LLM routing for challenging reasoning tasks. Our extensive experiments suggest that such routing shows promise but is not feasible in all scenarios, so more robust approaches should be investigated to fill this gap.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. LLM in a flash: Efficient Large Language Model Inference with Limited Memory. arXiv preprint arXiv:2312.11514.
  2. Open LLM Leaderboard.
  3. Christopher M Bishop. 2006. Pattern Recognition and Machine Learning. Springer.
  4. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  5. Language Models are Few-Shot Learners. Advances in neural information processing systems, 33:1877–1901.
  6. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology.
  7. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  8. Scaling Large Learning Problems with Hard Parallel Mixtures. In Pattern Recognition with Support Vector Machines, pages 8–23, Berlin, Heidelberg. Springer Berlin Heidelberg.
  9. Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing. In The Twelfth International Conference on Learning Representations.
  10. Learning Factored Representations in a Deep Mixture of Experts. arXiv preprint arXiv:1312.4314.
  11. Artyom Eliseev and Denis Mazur. 2023. Fast inference of mixture-of-experts language models with offloading. arXiv preprint arXiv:2312.17238.
  12. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. Journal of Machine Learning Research, 23(120):1–39.
  13. Measuring Massive Multitask Language Understanding. Proceedings of the International Conference on Learning Representations (ICLR).
  14. Adaptive Mixtures of Local Experts. Neural Computation, 3(1):79–87.
  15. Mixtral of Experts. arXiv preprint arXiv:2401.04088.
  16. LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14165–14178.
  17. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada. Association for Computational Linguistics.
  18. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  19. Camel: Communicative agents for "mind" exploration of large scale language model society. arXiv preprint arXiv:2303.17760.
  20. More Agents Is All You Need. arXiv preprint arXiv:2402.05120.
  21. Yixin Liu and Pengfei Liu. 2021. SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 1065–1072.
  22. Large language models: A survey. arXiv preprint arXiv:2402.06196.
  23. Are NLP Models really able to Solve Simple Math Word Problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080–2094, Online. Association for Computational Linguistics.
  24. Predicting Question-Answering Performance of Large Language Models through Semantic Consistency. arXiv preprint arXiv:2311.01152.
  25. Sebastian Raschka. 2020. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. arXiv preprint arXiv:1811.12808.
  26. SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4504–4524, Dublin, Ireland.
  27. Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning. arXiv preprint arXiv:1711.01239.
  28. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv preprint arXiv:2402.07927.
  29. Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting. arXiv preprint arXiv:2310.11324.
  30. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv preprint arXiv:1701.06538.
  31. Large Language Model Routing with Benchmark Datasets. arXiv preprint arXiv:2309.15789.
  32. KV Aditya Srivatsa and Ekaterina Kochmar. 2024. What makes math word problems challenging for llms? arXiv preprint arXiv:2403.11369.
  33. Evaluating the Factual Consistency of Large Language Models Through News Summarization. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5220–5255, Toronto, Canada. Association for Computational Linguistics.
  34. Fusing Models with Complementary Expertise. arXiv preprint arXiv:2310.01542.
  35. Self-consistency improves chain of thought reasoning in language models. Advances in neural information processing systems, 35:22199–22213.
  36. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837.
  37. Chain of Thought Prompting Elicits Knowledge Augmentation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6519–6534, Toronto, Canada. Association for Computational Linguistics.
  38. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv preprint arXiv:2308.08155.
  39. Sentiment Analysis in the Era of Large Language Models: A Reality Check. arXiv preprint arXiv:2305.15005.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com