Continual Deep Learning by Functional Regularisation of Memorable Past

Published 29 Apr 2020 in stat.ML and cs.LG | (2004.14070v4)

Abstract: Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past. Recent works address this with weight regularisation. Functional regularisation, although computationally expensive, is expected to perform better, but rarely does so in practice. In this paper, we fix this issue by using a new functional-regularisation approach that utilises a few memorable past examples crucial to avoid forgetting. By using a Gaussian Process formulation of deep networks, our approach enables training in weight-space while identifying both the memorable past and a functional prior. Our method achieves state-of-the-art performance on standard benchmarks and opens a new direction for life-long learning where regularisation and memory-based methods are naturally combined.

Abstract PDF Upgrade to Chat

Citations (124)

View on Semantic Scholar

Summary

The paper proposes FROMP, a novel framework that applies Gaussian Processes to transform weight regularization into functional regularization for retaining prior knowledge.
It introduces memorable past examples selected near decision boundaries to counteract catastrophic forgetting during continual learning.
Empirical evaluations on benchmarks such as Permuted MNIST and Split CIFAR show FROMP outperforms traditional weight-based methods in robustness and consistency.

Continual Deep Learning by Functional Regularisation of Memorable Past: A Review

The paper "Continual Deep Learning by Functional Regularisation of Memorable Past" addresses a fundamental challenge in deep learning: the catastrophic forgetting of previously learned tasks when a model is updated with new information. The authors propose a novel approach called FROMP (Functional Regularisation of Memorable Past) to mitigate this issue, leveraging a Gaussian Process (GP) formulation to establish a functional-prior regularisation method.

Overview of Approach

In the context of continual learning, traditional deep learning models suffer from catastrophic forgetting because they often lose performance on old tasks after being fine-tuned on new ones. The authors of this paper call attention to weight regularization methods that aim to minimize changes in critical weights identified during training on previous tasks. However, they suggest that these approaches can be ineffective due to the intricate relationships between network weights and predictions.

The proposed method, FROMP, utilizes Gaussian Processes to translate deep network weights into functional priors. Unlike previous methods that regularize weights directly, FROMP operates in function space and identifies memorable past examples vital to maintaining prediction consistency across tasks. Memorable past examples are those which lie close to decision boundaries and prove most challenging to classify, making them critical in maintaining the integrity of learned tasks.

Numerical Results and Performance

The paper provides a rigorous experimental evaluation of FROMP across several benchmarks, including Permuted MNIST, Split MNIST, and Split CIFAR. The results consistently demonstrate that FROMP achieves state-of-the-art performance by effectively combining the strengths of weight and functional regularization. By doing so, it outperforms both classic weight-centric approaches like Elastic Weight Consolidation (EWC) and more recent function-space methods that fail to exploit the advantages of functional priors.

Key Findings and Implications

Memorable Past Example Selection: The method effectively selects memorable past examples using function-space criteria, allowing the model to gracefully retain past knowledge without storing extensive data. This efficiency is crucial for scalability in real-world scenarios.
Functional Regularisation via Gaussian Processes: Incorporating a Gaussian Process approach provides a mechanism for projecting complex neural network dynamics into a tractable functional representation. This enables more robust continual learning through functional regularization that considers network uncertainty and output consistency.
Consistency and Robustness: Experiments show that FROMP not only avoids catastrophic forgetting more effectively than existing methods but also displays robustness across various types of datasets, indicating flexibility in adapting to diverse application areas.

Future Directions

The paper sets the stage for future research by hinting at several questions yet to be explored:

How can the principles of FROMP be extended to more complex models and tasks, such as those involving natural language processing or large-scale vision data?
What improvements in task adaptation can be realized if the memorable past identification process is enhanced with more sophisticated selection techniques?
How can the integration of functional regularization techniques further bridge the gap between neural networks and Gaussian processes, potentially leading to unified frameworks that capitalize on the strengths of both paradigms?

In conclusion, this paper provides a compelling framework for advancing continual learning in deep neural networks through function-space regularization, laying a foundation for more resilient AI systems.