Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Transferability in Pretrained Language Models (2004.14975v2)

Published 30 Apr 2020 in cs.CL, cs.AI, and cs.LG

Abstract: How does LLM pretraining help transfer learning? We consider a simple ablation technique for determining the impact of each pretrained layer on transfer task performance. This method, partial reinitialization, involves replacing different layers of a pretrained model with random weights, then finetuning the entire model on the transfer task and observing the change in performance. This technique reveals that in BERT, layers with high probing performance on downstream GLUE tasks are neither necessary nor sufficient for high accuracy on those tasks. Furthermore, the benefit of using pretrained parameters for a layer varies dramatically with finetuning dataset size: parameters that provide tremendous performance improvement when data is plentiful may provide negligible benefits in data-scarce settings. These results reveal the complexity of the transfer learning process, highlighting the limitations of methods that operate on frozen models or single data samples.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Alex Tamkin (29 papers)
  2. Trisha Singh (2 papers)
  3. Davide Giovanardi (1 paper)
  4. Noah Goodman (57 papers)
Citations (46)

Summary

We haven't generated a summary for this paper yet.