Two-stage LLM Fine-tuning with Less Specialization and More Generalization (2211.00635v3)

Published 1 Nov 2022 in cs.CL and cs.LG

Abstract: Pretrained LLMs are general purpose problem solvers applicable to a diverse set of tasks with prompts. They can be further improved towards a specific task by fine-tuning on a specialized dataset. However, fine-tuning usually makes the model narrowly specialized on this dataset with reduced general in-context learning performances, which is undesirable whenever the fine-tuned model needs to handle additional tasks where no fine-tuning data is available. In this work, we first demonstrate that fine-tuning on a single task indeed decreases LLMs' general in-context learning performance. We discover one important cause of such forgetting, format specialization, where the model overfits to the format of the fine-tuned task.We further show that format specialization happens at the very beginning of fine-tuning. To solve this problem, we propose Prompt Tuning with MOdel Tuning (ProMoT), a simple yet effective two-stage fine-tuning framework that reduces format specialization and improves generalization.ProMoT offloads task-specific format learning into additional and removable parameters by first doing prompt tuning and then fine-tuning the model itself with this soft prompt attached. With experiments on several fine-tuning tasks and 8 in-context evaluation tasks, we show that ProMoT achieves comparable performance on fine-tuned tasks to standard fine-tuning, but with much less loss of in-context learning performances across a board range of out-of-domain evaluation tasks. More importantly, ProMoT can even enhance generalization on in-context learning tasks that are semantically related to the fine-tuned task, e.g. ProMoT on En-Fr translation significantly improves performance on other language pairs, and ProMoT on NLI improves performance on summarization. Experiments also show that ProMoT can improve the generalization performance of multi-task training.

References (45)

Authors (8)

Yihan Wang (65 papers)
Si Si (24 papers)
Daliang Li (28 papers)
Michal Lukasik (23 papers)
Felix Yu (62 papers)
Cho-Jui Hsieh (211 papers)
Sanjiv Kumar (123 papers)
Inderjit S Dhillon (6 papers)

Citations (22)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/miclukasik/status/1787267732007756067

Two-stage LLM Fine-tuning with Less Specialization and More Generalization (2211.00635v3)

Summary

Related Papers

Tweets