Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

38 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

84 2 455

Knowledge Fusion of Large Language Models (2401.10491v2)

Published 19 Jan 2024 in cs.CL

Abstract: While training LLMs from scratch can generate models with distinct functionalities and strengths, it comes at significant costs and may result in redundant capabilities. Alternatively, a cost-effective and compelling approach is to merge existing pre-trained LLMs into a more potent model. However, due to the varying architectures of these LLMs, directly blending their weights is impractical. In this paper, we introduce the notion of knowledge fusion for LLMs, aimed at combining the capabilities of existing LLMs and transferring them into a single LLM. By leveraging the generative distributions of source LLMs, we externalize their collective knowledge and unique strengths, thereby potentially elevating the capabilities of the target model beyond those of any individual source LLM. We validate our approach using three popular LLMs with different architectures--Llama-2, MPT, and OpenLLaMA--across various benchmarks and tasks. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/FuseLLM}.

PDF HTML Abstract

Introduction

In the landscape of NLP, the development of LLMs represents a significant stride forward in the ability of machines to process and understand human language. The training of such models, albeit yielding powerful computational tools, demands substantial resources. The paper under discussion introduces an alternative to building these complex models from the ground up. The authors present an innovative technique known as knowledge fusion, which essentially merges the expertise of various pre-existing LLMs to produce an advanced and capable successor without the traditionally associated costs and environmental impact.

Methodology

The novel knowledge fusion strategy transcends traditional methods that typically require homogeneous model architectures or maintain multiple models in parallel. Instead, it harnesses the predictive power embedded in the generative distributions of various source LLMs. By focusing on the probabilistic outcomes these models generate, the authors can transfer the unique knowledge and strengths of each contributing LLM to a singular target LLM via a process of lightweight continual training. The amalgamation occurs not by blending raw model parameters but by aligning the token probabilities associated with specific text inputs.

Evaluation

The authors put their method to the test using three distinct LLMs: Llama-2, MPT, and OpenLLaMA. Across multiple tasks and benchmarks related to reasoning, commonsense understanding, and code generation, knowledge fusion displays a marked improvement in performance over individual source models and a basic ensemble baseline. Importantly, the improvements are not just quantitative; the fused model exhibits gains in a broad array of capabilities, hinting at a qualitative enhancement of the model's knowledge base.

Implications and Conclusions

Concluding their findings, the researchers underline the potency and promise of knowledge fusion in LLMs, noting it as a fertile area for future advancement. Their work demonstrates that the fused model surpasses the capability of its individual parts, suggesting that the collective wisdom of distinct models, when harnessed appropriately, can lead to a computational sum greater than its parts. The researchers provided a foundation for potentially cost-saving, environmentally friendlier, and sophisticated advancements in AI language processing, opening a door to a range of applications that can benefit from more intelligent and capable LLMs.

PDF Markdown Bookmark Chat (Pro)

References (64)

Authors (6)

Fanqi Wan (20 papers)
Xinting Huang (36 papers)
Deng Cai (181 papers)
Xiaojun Quan (52 papers)
Wei Bi (62 papers)
Shuming Shi (126 papers)

Citations (39)

View on Semantic Scholar

GitHub

GitHub - fanqiwan/FuseAI: FuseAI Project (455 stars)

Tweets

https://twitter.com/omarsar0/status/1749267666110489079

https://twitter.com/deng_cai/status/1749354943897821659

https://twitter.com/adnanhofficial/status/1750938446200238551

https://twitter.com/georgejrjrjr/status/1778589553957216424

https://twitter.com/KarSergios/status/1749739500761227318

https://twitter.com/knishimae0531/status/1749271568658862111

YouTube

Show All Videos