Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors (2402.15713v1)

Published 24 Feb 2024 in cs.CL and cs.AI

Abstract: Continual Few-shot Relation Extraction (CFRE) is a practical problem that requires the model to continuously learn novel relations while avoiding forgetting old ones with few labeled training data. The primary challenges are catastrophic forgetting and overfitting. This paper harnesses prompt learning to explore the implicit capabilities of pre-trained LLMs to address the above two challenges, thereby making LLMs better continual few-shot relation extractors. Specifically, we propose a Contrastive Prompt Learning framework, which designs prompt representation to acquire more generalized knowledge that can be easily adapted to old and new categories, and margin-based contrastive learning to focus more on hard samples, therefore alleviating catastrophic forgetting and overfitting issues. To further remedy overfitting in low-resource scenarios, we introduce an effective memory augmentation strategy that employs well-crafted prompts to guide ChatGPT in generating diverse samples. Extensive experiments demonstrate that our method outperforms state-of-the-art methods by a large margin and significantly mitigates catastrophic forgetting and overfitting in low-resource scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  2. Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. In Proceedings of the ACM Web conference 2022, pages 2778–2788.
  3. Consistent prototype learning for few-shot continual relation extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7409–7422.
  4. Refining sample embeddings with relation prototypes to enhance continual relation extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 232–243.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
  6. A survey for in-context learning. arXiv preprint arXiv:2301.00234.
  7. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734.
  8. Making pre-trained language models better few-shot learners. In Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 3816–3830. Association for Computational Linguistics.
  9. Ppt: Pre-trained prompt tuning for few-shot learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8410–8423.
  10. Continual relation learning via episodic memory activation and reconsolidation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6429–6440.
  11. Ptr: Prompt tuning with rules for text classification. AI Open, 3:182–192.
  12. Improving continual relation extraction through prototypical contrastive learning. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1885–1895.
  13. Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673.
  14. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
  15. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059.
  16. Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947.
  17. The global k-means clustering algorithm. Pattern recognition, 36(2):451–461.
  18. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. arXiv preprint arXiv:2308.08747.
  19. Supervised contrastive replay: Revisiting the nearest class mean classifier in online class-incremental continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3589–3599.
  20. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In Proceedings of the European conference on computer vision, pages 67–82.
  21. Introduction to information retrieval.
  22. Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation-Advances in Research and Theory, 24(C):109–165.
  23. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
  24. OpenAI. 2022. Chatgpt: Optimizing language models for dialogue. In https://openai.com/blog.
  25. OpenAI. 2023a. gpt-3.5-turbo. In https://platform.openai.com/docs/models/gpt-3-5.
  26. OpenAI. 2023b. Gpt-4 technical report. In https://arxiv.org/abs/2303.08774.
  27. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  28. Learning from context or names? an empirical study on neural relation extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 3661–3672.
  29. Chengwei Qin and Shafiq Joty. 2022. Continual few-shot relation learning via embedding space regularization and data augmentation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2776–2789.
  30. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010.
  31. Online structured laplace approximations for overcoming catastrophic forgetting. Advances in Neural Information Processing Systems, 31.
  32. Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269.
  33. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823.
  34. Continual learning with deep generative replay. Advances in neural information processing systems, 30.
  35. Kihyuk Sohn. 2016. Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems, 29.
  36. Infocl: Alleviating catastrophic forgetting in continual text classification from an information theoretic perspective. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14557–14570.
  37. Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of machine learning research, 9(11).
  38. Sentence embedding alignment for lifelong relation extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 796–806.
  39. Learning robust representations for continual relation extraction via adversarial class augmentation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6264–6278.
  40. Serial contrastive knowledge distillation for continual few-shot relation extraction. In Findings of the Association for Computational Linguistics, pages 12693–12706.
  41. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  42. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
  43. Investigating the catastrophic forgetting in multimodal large language models. arXiv preprint arXiv:2309.10313.
  44. Prompt-based prototypical framework for continual relation extraction. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:2801–2813.
  45. Consistent representation learning for continual relation extraction. In Findings of the Association for Computational Linguistics, pages 3402–3411.
  46. FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation. ISLRN https://thunlp.github.io/1/fewrel1.html.
  47. Position-aware Attention and Supervised Data Improve Slot Filling. ISLRN https://nlp.stanford.edu/projects/tacred/.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shengkun Ma (4 papers)
  2. Jiale Han (14 papers)
  3. Yi Liang (58 papers)
  4. Bo Cheng (51 papers)
Citations (2)