Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views (2402.04644v2)

Published 7 Feb 2024 in cs.LG and cs.AI

Abstract: Fine-tuning is becoming widely used for leveraging the power of pre-trained foundation models in new downstream tasks. While there are many successes of fine-tuning on various tasks, recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions (i.e., out-of-distribution; OOD). To improve OOD generalization, some previous studies identify the limitations of fine-tuning data and regulate fine-tuning to preserve the general representation learned from pre-training data. However, potential limitations in the pre-training data and models are often ignored. In this paper, we contend that overly relying on the pre-trained representation may hinder fine-tuning from learning essential representations for downstream tasks and thus hurt its OOD generalization. It can be especially catastrophic when new tasks are from different (sub)domains compared to pre-training data. To address the issues in both pre-training and fine-tuning data, we propose a novel generalizable fine-tuning method LEVI (Layer-wise Ensemble of different VIews), where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model, while preserving its efficiencies. By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks. Broad experiments with large language and vision models show that LEVI greatly improves fine-tuning generalization via emphasizing different views from fine-tuning data and pre-trained features.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Yuji Roh (11 papers)
  2. Qingyun Liu (6 papers)
  3. Huan Gui (11 papers)
  4. Zhe Yuan (75 papers)
  5. Yujin Tang (31 papers)
  6. Steven Euijong Whang (27 papers)
  7. Liang Liu (237 papers)
  8. Shuchao Bi (5 papers)
  9. Lichan Hong (35 papers)
  10. Ed H. Chi (74 papers)
  11. Zhe Zhao (97 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets