Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HOP to the Next Tasks and Domains for Continual Learning in NLP (2402.18449v1)

Published 28 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Continual Learning (CL) aims to learn a sequence of problems (i.e., tasks and domains) by transferring knowledge acquired on previous problems, whilst avoiding forgetting of past ones. Different from previous approaches which focused on CL for one NLP task or domain in a specific use-case, in this paper, we address a more general CL setting to learn from a sequence of problems in a unique framework. Our method, HOP, permits to hop across tasks and domains by addressing the CL problem along three directions: (i) we employ a set of adapters to generalize a large pre-trained model to unseen problems, (ii) we compute high-order moments over the distribution of embedded representations to distinguish independent and correlated statistics across different tasks and domains, (iii) we process this enriched information with auxiliary heads specialized for each end problem. Extensive experimental campaign on 4 NLP applications, 5 benchmarks and 2 CL setups demonstrates the effectiveness of our HOP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. Subword Pooling Makes a Difference. In Conference of the European Chapter of the Association for Computational Linguistics (EACL). Association for Computational Linguistics.
  2. Uncertainty-based continual learning with adaptive regularization. Advances in Neural Information Processing Systems (NeurIPS), 32.
  3. Progressive Memory Banks for Incremental Domain Adaptation. In International Conference on Learning Representations (ICLR).
  4. Continual Lifelong Learning in Natural Language Processing: A Survey. In International Conference on Computational Linguistics (COLING), 6523–6541.
  5. Dark experience for general continual learning: a strong, simple baseline. Advances in Neural Information Processing Systems (NeurIPS), 33: 15920–15930.
  6. Toward an architecture for never-ending language learning. In AAAI Conference on Artificial Intelligence.
  7. Efficient Lifelong Learning with A-GEM. In International Conference on Learning Representations (ICLR).
  8. Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 12(3): 1–207.
  9. Lifelong Learning for Sentiment Classification. In Association for Computational Linguistics and International Joint Conference on Natural Language Processing, 750–756.
  10. Continuous-time attention for sequential learning. AAAI Conference on Artificial Intelligence, 35(8): 7116–7124.
  11. Lifelong Language Knowledge Distillation. In Conference on Empirical Methods in Natural Language Processing (EMNLP).
  12. Distribution kernels based on moments of counts. In International Conference on Machine Learning (ICML), 25.
  13. A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(7): 3366–3385.
  14. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 4171–4186.
  15. A holistic lexicon-based approach to opinion mining. In International Conference on Web Search and Data Mining, 231–240.
  16. Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering. In Annual Meeting of the Association for Computational Linguistics, 3601–3605.
  17. Parameter-efficient transfer learning for NLP. In Proceeding of the International Conference on Machine Learning (ICML), 2790–2799. PMLR.
  18. Mining and summarizing customer reviews. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 168–177.
  19. Continual learning by using information of each class holistically. AAAI Conference on Artificial Intelligence, 35(9).
  20. Selective experience replay for lifelong learning. In AAAI Conference on Artificial Intelligence, volume 32.
  21. Less-forgetting learning in deep neural networks. arXiv:1607.00122.
  22. Continual Learning of Natural Language Processing Tasks: A Survey. arXiv:2211.12701.
  23. Continual learning of a mixed sequence of similar and dissimilar tasks. Advances in Neural Information Processing Systems (NeurIPS), 33.
  24. Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning. Advances in Neural Information Processing Systems (NeurIPS), 34: 22443–22456.
  25. Continual learning with knowledge transfer for sentiment classification. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 683–698. Springer.
  26. CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 6871–6883.
  27. Adapting BERT for Continual Learning of a Sequence of Aspect Sentiment Classification Tasks. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 4746–4755.
  28. Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation. In Workshop on Neural Machine Translation and Generation, 36–44. Melbourne, Australia: Association for Computational Linguistics.
  29. Kim, Y. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1746–1751. Doha, Qatar: Association for Computational Linguistics.
  30. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences (PNAS), 114(13): 3521–3526.
  31. Lang, K. 1995. Newsweeder: Learning to filter netnews. In Machine Learning Proceedings, 331–339. Elsevier.
  32. Gradient-based learning applied to document recognition. Proc. of the IEEE, 86(11): 2278–2324.
  33. Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges. Information Fusion, 58: 52–68.
  34. Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In Conference on Computer Vision and Pattern Recognition (CVPR), 947–955.
  35. Compositional Language Continual Learning. In International Conference on Learning Representations (ICLR).
  36. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12): 2935–2947.
  37. Automated rule selection for aspect extraction in opinion mining. In International Joint Conference on Artificial Intelligence (IJCAI).
  38. Continual Learning for Sentence Representations Using Conceptors. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3274–3279.
  39. Exploring fine-tuning techniques for pre-trained cross-lingual models via continual learning. arXiv:2004.14218.
  40. Gradient episodic memory for continual learning. Advances in Neural Information Processing Systems (NeurIPS), 30.
  41. Sentiment classification by leveraging the shared knowledge from a sequence of domains. In International Conference on Database Systems for Advanced Applications, 795–811. Springer.
  42. Continual learning in task-oriented dialogue systems. arXiv:2012.15504.
  43. Online continual learning in image classification: An empirical survey. Neurocomputing, 469: 28–51.
  44. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In European Conference on Computer Vision (ECCV), 67–82.
  45. Packnet: Adding multiple tasks to a single network by iterative pruning. In Conference on Computer Vision and Pattern Recognition (CVPR), 7765–7773.
  46. Recall: Replay-based continual learning in semantic segmentation. In International Conference on Computer Vision (ICCV), 7026–7035.
  47. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation, volume 24, 109–165. Elsevier.
  48. Object-Conditioned Bag of Instances for Few-Shot Personalized Instance Recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  49. Online Continual Learning for Robust Indoor Object Recognition. International Conference on Intelligent Robotics.
  50. Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics. INTERSPEECH.
  51. Domain adaptation and continual learning in semantic segmentation. In Davies, E.; and Turk, M. A., eds., Advanced Methods and Deep Learning in Computer Vision, Computer Vision and Pattern Recognition, 275–303. Academic Press. ISBN 978-0-12-822109-9.
  52. Incremental learning techniques for semantic segmentation. In International Conference on Computer Vision Workshops (ICCVW), 0–0.
  53. Knowledge distillation for incremental learning in semantic segmentation. Computer Vision and Image Understanding (CVIU), 205: 103167.
  54. Architecture matters in continual learning. arXiv:2202.00275.
  55. Continual learning for named entity recognition. AAAI Conference on Artificial Intelligence, 35(15): 13570–13577.
  56. On the performance of time-pooling strategies for end-to-end spoken language identification. In Conference on Language Resources and Evaluation Conference (LREC), 3566–3572.
  57. Variational Continual Learning. In International Conference on Learning Representations (ICLR).
  58. Probability, random variables and stochastic processes.
  59. AdapterFusion: Non-destructive task composition for transfer learning. In Conference of the European Chapter of the Association for Computational Linguistics (EACL), 487–503. Association for Computational Linguistics (ACL).
  60. MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 7654–7673.
  61. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In PInternational Workshop on Semantic Evaluation (SemEval 2014), 27–35. Dublin, Ireland: Association for Computational Linguistics.
  62. A student-teacher architecture for dialog domain adaptation under the meta-learning setting. AAAI Conference on Artificial Intelligence, 35(15): 13692–13700.
  63. Using the past knowledge to improve sentiment classification. In Findings of the Association for Computational Linguistics: Conference on Empirical Methods in Natural Language Processing (EMNLP), 1124–1133.
  64. Language models are unsupeke2022continualpretrainingrvised multitask learners. OpenAI blog, 1(8): 9.
  65. icarl: Incremental classifier and representation learning. In Conference on Computer Vision and Pattern Recognition (CVPR), 2001–2010.
  66. Hierarchical models of object recognition in cortex. Nature Neural Networks (NN), 2(11): 1019–1025.
  67. Progressive neural networks. arXiv:1606.04671.
  68. ELLA: An efficient lifelong learning algorithm. In International Conference on Machine Learning (ICML), 507–515. PMLR.
  69. Overcoming catastrophic forgetting with hard attention to the task. In International Conference on Machine Learning (ICML), 4548–4557. PMLR.
  70. A progressive model to enable continual learning for semantic slot filling. In Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 1279–1284.
  71. Continual learning with deep generative replay. Advances in Neural Information Processing Systems (NeurIPS), 30.
  72. Lifelong machine learning systems: Beyond learning algorithms. In AAAI spring symposium series.
  73. LAMOL: LAnguage MOdeling for Lifelong Language Learning. In International Conference on Learning Representations (ICLR).
  74. Three scenarios for continual learning. arXiv:1904.07734.
  75. Forward and backward knowledge transfer for sentiment classification. In Asian Conference on Machine Learning (ACML), 457–472. PMLR.
  76. Effective Continual Learning for Text Classification with Lightweight Snapshots. AAAI Conference on Artificial Intelligence, 37(8): 10122–10130.
  77. Lifelong learning memory networks for aspect sentiment classification. In IEEE International Conference on Big Data (Big Data), 861–870. IEEE.
  78. Revisiting the statistics pooling layer in deep speaker embedding learning. In International Symposium on Chinese Spoken Language Processing (ISCSLP), 1–5. IEEE.
  79. Isolation and impartial aggregation: A paradigm of incremental learning without interference. AAAI Conference on Artificial Intelligence, 37(8): 10209–10217.
  80. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
  81. Attentive pooling with learnable norms for text representation. In Annual Meeting of the Association for Computational Linguistics, 2961–2970.
  82. Continual Graph Convolutional Network for Text Classification. AAAI Conference on Artificial Intelligence.
  83. Reinforced continual learning. Advances in Neural Information Processing Systems (NeurIPS), 31.
  84. Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, 1(8): 364–372.
  85. Continual learning through synaptic intelligence. In International Conference on Machine Learning (ICML), 3987–3995. PMLR.
  86. Meta-curriculum learning for domain adaptation in neural machine translation. AAAI Conference on Artificial Intelligence, 35(16): 14310–14318.
  87. Augment BERT with average pooling layer for Chinese summary generation. Journal of Intelligent & Fuzzy Systems, 1–10.
  88. An adaptive hybrid framework for cross-domain aspect-based sentiment analysis. AAAI Conference on Artificial Intelligence, 35(16): 14630–14637.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com