LM-Cocktail: Resilient Tuning of Language Models via Model Merging (2311.13534v4)

Published 22 Nov 2023 in cs.CL, cs.AI, and cs.IR

Abstract: The pre-trained LLMs are continually fine-tuned to better support downstream applications. However, this operation may result in significant performance degeneration on general tasks beyond the targeted domain. To overcome this problem, we propose LM-Cocktail which enables the fine-tuned model to stay resilient in general perspectives. Our method is conducted in the form of model merging, where the fine-tuned LLM is merged with the pre-trained base model or the peer models from other domains through weighted average. Despite simplicity, LM-Cocktail is surprisingly effective: the resulted model is able to achieve a strong empirical performance in the whole scope of general tasks while preserving a superior capacity in its targeted domain. We conduct comprehensive experiments with LLama and BGE model on popular benchmarks, including FLAN, MMLU, MTEB, whose results validate the efficacy of our proposed method. The code and checkpoints are available at https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail.

References (48)

Citations (18)

View on Semantic Scholar

Summary

The paper introduces LM-Cocktail, a post-refinement method that uses weighted model merging to alleviate catastrophic forgetting and preserve general capabilities.
The methodology integrates fine-tuned and base models with minimal computational overhead, fitting smoothly into existing LLM workflows.
Empirical results on both decoder and encoder-based models confirm enhanced task-specific accuracy without degrading performance on broader tasks.

Analyzing "LM-Cocktail: Resilient Tuning of LLMs via Model Merging"

The paper "LM-Cocktail: Resilient Tuning of LLMs via Model Merging" by Shitao Xiao et al., presents a methodological advancement for fine-tuning LLMs. This research addresses the prevalent issue of catastrophic forgetting, where fine-tuning LLMs can enhance performance on specific tasks but diminish general capabilities across other tasks.

Methodology Overview

The authors propose LM-Cocktail, a straightforward and effective approach involving model merging through weighted averaging. This technique integrates the fine-tuned model with a pre-trained base model and potentially other domain-specific fine-tuned peer models. The process is designed to bolster task-specific performance without sacrificing the model's ability to perform well on a broad array of tasks.

The methodology offers a practical solution by functioning as a post-refinement step requiring minimal computational overhead. This makes it highly compatible with existing workflows. The algorithm computes merging weights using a softened loss-based approach on few-shot examples from the target domain.

Experimental Evaluation

The paper employs LLama and BGE models, offering comprehensive experiments across benchmarks like FLAN, MMLU, and MTEB. Notably, the empirical results reflect strong improvements in task-specific domains with no detrimental effects on general tasks, showcasing the efficacy of LM-Cocktail.

Decoder-based LLMs: The experiments reveal that LM-Cocktail enhances both target task accuracy and performance across other tasks. Merging with the base model and additional fine-tuned models (LM-Cocktail $_2$ and LM-Cocktail $_{10}$ ) consistently showed improved general capabilities.
Encoder-based Models: Similar trends were observed, illustrating the versatility of LM-Cocktail across different types of LLMs.

Implications and Future Directions

The proposed method offers significant practical advantages. It highlights an efficient strategy to sustain general LLM capabilities while tailoring specific tasks, all without necessitating extensive retraining sessions. The approach’s simplicity ensures broad applicability, including scenarios lacking full fine-tuning capacities due to data or resource constraints.

Theoretical contributions lie in its development of a resilient tuning paradigm harmonizing specialist and generalist model traits. This can be instrumental in evolving LLM applications, especially in diverse multi-task or rapidly changing environments.

Future explorations could investigate more sophisticated weight computation methods or extend applicability across various architectures. Additionally, exploring integrations with more complex model interaction frameworks may yield even richer performance metrics in dynamically tuned AI systems.

The LM-Cocktail approach underscores the potential of model merging innovations, inviting further research on resilient fine-tuning methods in the field of AI.

PDF Markdown

Related Papers

GitHub

GitHub - FlagOpen/FlagEmbedding: Dense Retrieval and Retrieval-augmented LLMs (5,521 stars)

Tweets

https://twitter.com/tomaarsen/status/1849441251235414157

https://twitter.com/1097996228464463873/status/1740298445527920762

https://twitter.com/suhxas/status/1914253032730316816