Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training (2009.11538v3)

Published 24 Sep 2020 in cs.CL

Abstract: Adapting pre-trained LLMs (PrLMs) (e.g., BERT) to new domains has gained much attention recently. Instead of fine-tuning PrLMs as done in most previous work, we investigate how to adapt the features of PrLMs to new domains without fine-tuning. We explore unsupervised domain adaptation (UDA) in this paper. With the features from PrLMs, we adapt the models trained with labeled data from the source domain to the unlabeled target domain. Self-training is widely used for UDA which predicts pseudo labels on the target domain data for training. However, the predicted pseudo labels inevitably include noise, which will negatively affect training a robust model. To improve the robustness of self-training, in this paper we present class-aware feature self-distillation (CFd) to learn discriminative features from PrLMs, in which PrLM features are self-distilled into a feature adaptation module and the features from the same class are more tightly clustered. We further extend CFd to a cross-language setting, in which language discrepancy is studied. Experiments on two monolingual and multilingual Amazon review datasets show that CFd can consistently improve the performance of self-training in cross-domain and cross-language settings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hai Ye (18 papers)
  2. Qingyu Tan (9 papers)
  3. Ruidan He (11 papers)
  4. Juntao Li (89 papers)
  5. Hwee Tou Ng (44 papers)
  6. Lidong Bing (144 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.