Effectiveness of Data Augmentation for Parameter Efficient Tuning with Limited Data (2303.02577v2)
Abstract: Recent work has demonstrated that using parameter efficient tuning techniques such as prefix tuning (or P-tuning) on pretrained LLMs can yield performance that is comparable or superior to fine-tuning while dramatically reducing trainable parameters. Nevertheless, the effectiveness of such methods under the context of data augmentation, a common strategy to improve learning under low data regimes, has not been fully explored. In this paper, we examine the effectiveness of several popular task-agnostic data augmentation techniques, i.e., EDA, Back Translation, and Mixup, when using two general parameter efficient tuning methods, P-tuning v2 and LoRA, under data scarcity. We show that data augmentation can be used to boost the performance of P-tuning and LoRA models, but the effectiveness of each technique varies and certain methods can lead to a notable degradation in performance, particularly when using larger models and on harder tasks. We further analyze the sentence representations of P-tuning compared to fine-tuning to help understand the above behaviour, and reveal how P-tuning generally presents a more limited ability to separate the sentence embeddings from different classes of augmented data. In addition, it displays poorer performance on heavily altered data. However, we demonstrate that by adding a simple contrastive loss function it can help mitigate such issues for prefix tuning, resulting in sizable improvements to augmented data performance.
- Do Not Have Enough Data? Deep Learning to the Rescue! In AAAI.
- The second PASCAL recognising textual entailment challenge.
- Deep learning for ai. Commun. ACM, 64(7):58–65.
- The fifth PASCAL recognizing textual entailment challenge.
- The PASCAL recognising textual entailment challenge. In Machine learning challenges. evaluating predictive uncertainty, visual object classification, and recognising tectual entailment, pages 177–190.
- The CommitmentBank: Investigating projection in naturally occurring discourse.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation.
- A Survey of Data Augmentation Approaches for NLP. In ACL/IJCNLP (Findings), pages 968–988.
- Siddhant Garg and Goutham Ramakrishnan. 2020. BAE: BERT-based adversarial examples for text classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
- The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, pages 1–9. Association for Computational Linguistics.
- Hongyu Guo. 2020. Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification. In AAAI.
- Augmenting Data with Mixup for Sentence Classification: An Empirical Study. ArXiv, abs/1905.08941.
- MixUp as Locally Linear Out-Of-Manifold Regularization. In AAAI.
- Towards a Unified View of Parameter-Efficient Transfer Learning. In International Conference on Learning Representations.
- LoRA: Low-Rank Adaptation of Large Language Models.
- The Winograd schema challenge. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, volume 46, page 47.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation.
- P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. CoRR, abs/2110.07602.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach.
- How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers? In EMNLP (Findings), pages 4401–4411.
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations.
- On the stability of fine-tuning {bert}: Misconceptions, explanations, and strong baselines. In International Conference on Learning Representations.
- On the impact of data augmentation on downstream performance in natural language processing. In Proceedings of the Third Workshop on Insights from Negative Results in NLP, pages 88–93, Dublin, Ireland. Association for Computational Linguistics.
- Representation Learning with Contrastive Predictive Coding.
- AdapterHub: A Framework for Adapting Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 46–54.
- Guanghui Qin and Jason Eisner. 2021. Learning How to Ask: Querying LMs with Mixtures of Soft Prompts.
- Language Models are Unsupervised Multitask Learners.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In 2011 AAAI Spring Symposium Series.
- Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 86–96, Berlin, Germany. Association for Computational Linguistics.
- Parsing With Compositional Vector Grammars. In EMNLP.
- Jörg Tiedemann. 2020. The tatoeba translation challenge – realistic data sets for low resource and multilingual MT. In Proceedings of the Fifth Conference on Machine Translation, pages 1174–1182, Online. Association for Computational Linguistics.
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(86):2579–2605.
- SuperGLUE: A stickier benchmark for general-purpose language understanding systems. arXiv preprint 1905.00537.
- Jason Wei and Kai Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6382–6388, Hong Kong, China. Association for Computational Linguistics.
- A Discriminative Feature Learning Approach for Deep Face Recognition. In ECCV.
- HuggingFace’s Transformers: State-of-the-art Natural Language Processing.
- Big Bird: Transformers for Longer Sequences. In NeurIPS, pages 17283–17297. Curran Associates, Inc.
- mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations.
- Stephen Obadinma (6 papers)
- Hongyu Guo (48 papers)
- Xiaodan Zhu (94 papers)