Back-Translated Task Adaptive Pretraining: Improving Accuracy and Robustness on Text Classification (2107.10474v1)

Published 22 Jul 2021 in cs.CL and cs.LG

Abstract: LLMs (LMs) pretrained on a large text corpus and fine-tuned on a downstream text corpus and fine-tuned on a downstream task becomes a de facto training strategy for several NLP tasks. Recently, an adaptive pretraining method retraining the pretrained LLM with task-relevant data has shown significant performance improvements. However, current adaptive pretraining methods suffer from underfitting on the task distribution owing to a relatively small amount of data to re-pretrain the LM. To completely use the concept of adaptive pretraining, we propose a back-translated task-adaptive pretraining (BT-TAPT) method that increases the amount of task-specific data for LM re-pretraining by augmenting the task data using back-translation to generalize the LM to the target task domain. The experimental results show that the proposed BT-TAPT yields improved classification accuracy on both low- and high-resource data and better robustness to noise than the conventional adaptive pretraining method.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (3)

Junghoon Lee (18 papers)
Jounghee Kim (2 papers)
Pilsung Kang (28 papers)

Citations (5)

View on Semantic Scholar

Back-Translated Task Adaptive Pretraining: Improving Accuracy and Robustness on Text Classification (2107.10474v1)

Related Papers