Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts (2403.12918v1)

Published 19 Mar 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Pretrained LLMs (PLMs) have advanced NLP tasks significantly, but finetuning PLMs on low-resource datasets poses significant challenges such as instability and overfitting. Previous methods tackle these issues by finetuning a strategically chosen subnetwork on a downstream task, while keeping the remaining weights fixed to the pretrained weights. However, they rely on a suboptimal criteria for sub-network selection, leading to suboptimal solutions. To address these limitations, we propose a regularization method based on attention-guided weight mixup for finetuning PLMs. Our approach represents each network weight as a mixup of task-specific weight and pretrained weight, controlled by a learnable attention parameter, providing finer control over sub-network selection. Furthermore, we employ a bi-level optimization (BLO) based framework on two separate splits of the training dataset, improving generalization and combating overfitting. We validate the efficacy of our proposed method through extensive experiments, demonstrating its superiority over previous methods, particularly in the context of finetuning PLMs on low-resource datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sai Ashish Somayajula (8 papers)
  2. Youwei Liang (16 papers)
  3. Abhishek Singh (71 papers)
  4. Li Zhang (693 papers)
  5. Pengtao Xie (86 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.