Continual Learning for Text Classification with Information Disentanglement Based Regularization

Published 12 Apr 2021 in cs.CL and cs.AI | (2104.05489v2)

Abstract: Continual learning has become increasingly important as it enables NLP models to constantly learn and gain knowledge over time. Previous continual learning methods are mainly designed to preserve knowledge from previous tasks, without much emphasis on how to well generalize models to new tasks. In this work, we propose an information disentanglement based regularization method for continual learning on text classification. Our proposed method first disentangles text hidden spaces into representations that are generic to all tasks and representations specific to each individual task, and further regularizes these representations differently to better constrain the knowledge required to generalize. We also introduce two simple auxiliary tasks: next sentence prediction and task-id prediction, for learning better generic and specific representation spaces. Experiments conducted on large-scale benchmarks demonstrate the effectiveness of our method in continual text classification tasks with various sequences and lengths over state-of-the-art baselines. We have publicly released our code at https://github.com/GT-SALT/IDBR.

Abstract PDF Upgrade to Chat

Citations (89)

View on Semantic Scholar

Summary

The paper introduces an information disentanglement-based regularization method for continual text classification that separates task-generic and task-specific features.
The approach employs auxiliary tasks and a K-means memory selection mechanism to reduce memory costs while enhancing model adaptability.
Empirical results reveal improved average accuracy over methods like MBPA++ and LAMOL, demonstrating robust performance in sequential learning settings.

Continual Learning for Text Classification with Information Disentanglement Based Regularization

The paper "Continual Learning for Text Classification with Information Disentanglement Based Regularization" presents a methodological advancement in the domain of NLP focused on improving continual learning strategies for text classification tasks. Continual learning is essential as it enables models to learn incrementally from streams of data without suffering from catastrophic forgetting, a problem where models forget previously acquired information upon learning new tasks.

Existing continual learning approaches in NLP can be broadly divided into replay-based methods, which involve revisiting samples from previous tasks, and regularization-based methods that constrain model parameters to retain learned knowledge. The disadvantage of replay-based methods is their high memory and computational costs due to the storing of extensive datasets and generative samples. Regularization-based approaches, on the other hand, often apply uniform constraints, failing to distinguish between information that is generic to all tasks and information specific to individual tasks.

This paper introduces an information disentanglement-based regularization method which is aimed at optimizing continual learning for text classification. The key insight of this approach is to disentangle the hidden representations obtained from text into task-generic and task-specific spaces. Task-generic representations are ubiquitously applicable across different tasks and are thus stabilized to prevent significant changes during task transitions. In contrast, task-specific representations, being unique to individual tasks, are allowed more flexibility to adapt to new task requirements.

To achieve this disentanglement, the authors use auxiliary tasks: next sentence prediction to extract task-generic information, and task identifier prediction to learn task-specific representations. The next sentence prediction task helps model syntactic and contextual relationships which remain relevant across tasks, while the task identifier prediction pushes the representation to encode task-specific nuances.

The authors supplement this regularization approach with a memory selection mechanism, utilizing K-means clustering to pick a diverse set of examples from past tasks. This strategy minimizes the memory footprint while ensuring effective knowledge retention.

The paper provides extensive empirical evidence, demonstrating that this method outperforms existing state-of-the-art continual learning methodologies such as MBPA++ and LAMOL by improving average accuracy across multiple benchmark datasets. The results exhibit considerable improvements, notably in settings with restricted memory resources.

A rigorous evaluation was conducted across different task sequences to test the robustness of the proposed model against varying order and length of task sequences. The paper highlights a significant advantage in using information disentanglement as it improves the model's ability to recall previous task information while adapting to new tasks.

The study offers practical implications for deploying NLP systems that require adaptability to evolving data without degradation in performance, which is paramount in real-world applications like social media monitoring and customer sentiment analysis. Theoretically, this approach contributes to ongoing research in lifelong learning systems by offering a robust framework to balance stability and plasticity in model learning dynamics.

Potential areas for future development include expanding this methodology to other sequential learning problems in NLP, such as sequence generation or more complex task interdependencies. Overall, the paper provides a detailed theoretical basis, supported by empirical validation, for utilizing an information disentanglement approach in continual learning frameworks.

Markdown