Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM (2401.02994v3)

Published 4 Jan 2024 in cs.CL and cs.AI

Abstract: In conversational AI research, there's a noticeable trend towards developing models with a larger number of parameters, exemplified by models like ChatGPT. While these expansive models tend to generate increasingly better chat responses, they demand significant computational resources and memory. This study explores a pertinent question: Can a combination of smaller models collaboratively achieve comparable or enhanced performance relative to a singular large model? We introduce an approach termed "blending", a straightforward yet effective method of integrating multiple chat AIs. Our empirical evidence suggests that when specific smaller models are synergistically blended, they can potentially outperform or match the capabilities of much larger counterparts. For instance, integrating just three models of moderate size (6B/13B paramaeters) can rival or even surpass the performance metrics of a substantially larger model like ChatGPT (175B+ paramaters). This hypothesis is rigorously tested using A/B testing methodologies with a large user base on the Chai research platform over a span of thirty days. The findings underscore the potential of the "blending" strategy as a viable approach for enhancing chat AI efficacy without a corresponding surge in computational demands.

Introduction

The field of conversational AI, particularly involving LLMs such as ChatGPT, has seen a trend toward creating ever-larger models to improve the quality of chat responses. However, these large models, often with hundreds of billions of parameters, come with significant computational and memory requirements. A recently introduced methodology called "Blending" addresses whether multiple smaller models combined could match or exceed the performance of a singular, larger model in the context of conversational AI.

Blending Methodology

The Blending technique involves integrating multiple smaller chat AI systems to work collaboratively, enabling the combined system to generate responses that harness the strengths of each individual model. Empirical tests on the Chai research platform have demonstrated that an ensemble comprised of three models, each with 6 to 13 billion parameters, can outdo a single model like ChatGPT, which boasts over 175 billion parameters. This is particularly noteworthy as the blended ensemble also yields significant improvements in user retention—indicating a more engaging user experience—while only requiring a fraction of the computational cost associated with larger models.

Empirical Evidence and Findings

A blend of smaller models, when selected randomly, appears to exhibit the “best of all” individual model characteristics, infusing diversity and a certain specialized expertise into the chat responses. This results in a more dynamic and engaging interaction for users. During the research period of thirty days, a comparison of user interaction statistics suggested superior performance of the blended models in both engagement and retention metrics, outpacing the singular large model's abilities.

Implications and Future Directions

The significant takeaway from the paper is the possibility that increasing the sheer size of models may not be the only path toward enhancing conversational AI. By blending smaller models, not only can the efficiency in computational demands be maintained, but user engagement and conversation quality can also see marked improvements. Future research plans include scaling the number of component systems to enrich conversation diversity further and training classifiers to predict the optimal chat AI to respond at any given turn to maximize engagement. This could lead to a more nuanced selection process over just a uniform random choice and the potential to add new models without risking downgraded performance.

Conclusion

The Blending approach presents a compelling alternative to the industry's current trajectory of building increasingly LLMs for conversational AI. The evidence suggests that a collaborative multi-model approach yields significant improvements in user engagement while maintaining leaner computational requirements. As this methodology finds its way into practice, it has the potential to redefine the strategies for developing future chat AIs, advocating for a more collaborative, multi-faceted approach over size and scale.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Towards a human-like open-domain chatbot. CoRR, abs/2001.09977.
  2. A general language assistant as a laboratory for alignment. CoRR, abs/2112.00861.
  3. PLATO-2: towards building an open-domain chatbot via curriculum learning. CoRR, abs/2006.16779.
  4. A survey on dialogue systems: Recent advances and new frontiers. SIGKDD Explor. Newsl., 19(2):25–35.
  5. Ritvik Choudhary and Daisuke Kawahara. 2022. Grounding in social media: An approach to building a chit-chat dialogue model. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 9–15, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
  6. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
  7. Plug and play language models: A simple approach to controlled text generation. CoRR, abs/1912.02164.
  8. Summeval: Re-evaluating summarization evaluation.
  9. Ensemble distillation approaches for grammatical error correction. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2745–2749.
  10. Ensemble distillation for neural machine translation. arXiv preprint arXiv:1702.01802.
  11. High quality rather than high model probability: Minimum Bayes risk decoding with neural metrics. Transactions of the Association for Computational Linguistics, 10:811–825.
  12. Iason Gabriel. 2020. Artificial intelligence, values and alignment. CoRR, abs/2001.09768.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
  14. Rewarding chatbots for real-world engagement with millions of users.
  15. Llm-blender: Ensembling large language models with pairwise ranking and generative fusion.
  16. The relative performance of ensemble methods with deep convolutional neural networks for image classification. Journal of Applied Statistics, 45(15):2800–2818.
  17. Deep learning-and word embedding-based heterogeneous classifier ensembles for text classification. Complexity, 2018.
  18. Shankar Kumar and William Byrne. 2004. Minimum Bayes-risk decoding for statistical machine translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pages 169–176, Boston, Massachusetts, USA. Association for Computational Linguistics.
  19. Scalable agent alignment via reward modeling: a research direction. CoRR, abs/1811.07871.
  20. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models.
  21. Yong Liu and Xin Yao. 1999. Ensemble learning via negative correlation. Neural networks, 12(10):1399–1404.
  22. Llm comparative assessment: Zero-shot nlg evaluation through pairwise comparisons using large language models.
  23. Andrey Malinin and Mark Gales. 2021. Uncertainty estimation in autoregressive structured prediction. In International Conference on Learning Representations.
  24. Cued at probsum 2023: Hierarchical ensemble of summarization models.
  25. BARD: A structured technique for group elicitation of bayesian networks to support analytic reasoning. Risk Analysis, 42(6):1155–1178.
  26. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
  27. Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI. Association for Computational Linguistics, Online.
  28. Vyas Raina and Mark Gales. 2023. Minimum bayes’ risk decoding for system combination of grammatical error correction systems.
  29. Universal adversarial attacks on spoken language assessment systems. In Interspeech 2020. ISCA.
  30. Recipes for building an open-domain chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 300–325, Online. Association for Computational Linguistics.
  31. Combining outputs from multiple machine translation systems. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 228–235, Rochester, New York. Association for Computational Linguistics.
  32. Ensemble feature selection: homogeneous and heterogeneous approaches. Knowledge-Based Systems, 118:124–139.
  33. Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709.
  34. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  35. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021.
  36. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9.
  37. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  38. Joseph Weizenbaum. 1966. Eliza—a computer program for the study of natural language communication between man and machine. Commun. ACM, 9(1):36–45.
  39. David H Wolpert. 1992. Stacked generalization. Neural networks, 5(2):241–259.
  40. Deep learning for dialogue systems: Chit-chat and beyond. Foundations and Trends® in Information Retrieval, 15(5):417–589.
  41. A short survey of pre-trained language models for conversational ai-a new age in nlp. In Proceedings of the Australasian Computer Science Week Multiconference, ACSW ’20, New York, NY, USA. Association for Computing Machinery.
  42. Zhenyi Zhu. 2022. A simple survey of pre-trained language models. Preprints.org 202208.0238.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xiaoding Lu (5 papers)
  2. Adian Liusie (20 papers)
  3. Vyas Raina (18 papers)
  4. Yuwen Zhang (48 papers)
  5. William Beauchamp (4 papers)
  6. Zongyi Liu (6 papers)
  7. Vineet Mudupalli (2 papers)
Citations (11)
Youtube Logo Streamline Icon: https://streamlinehq.com