Overcoming Catastrophic Forgetting by Incremental Moment Matching (1703.08475v3)

Published 24 Mar 2017 in cs.LG and cs.AI

Abstract: Catastrophic forgetting is a problem of neural networks that loses the information of the first task after training the second task. Here, we propose a method, i.e. incremental moment matching (IMM), to resolve this problem. IMM incrementally matches the moment of the posterior distribution of the neural network which is trained on the first and the second task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. We analyze our approach on a variety of datasets including the MNIST, CIFAR-10, Caltech-UCSD-Birds, and Lifelog datasets. The experimental results show that IMM achieves state-of-the-art performance by balancing the information between an old and a new network.

Authors (5)

Sang-Woo Lee (34 papers)
Jin-Hwa Kim (42 papers)
Jaehyun Jun (2 papers)
Jung-Woo Ha (67 papers)
Byoung-Tak Zhang (83 papers)

Citations (640)

View on Semantic Scholar

Summary

The paper introduces IMM, which incrementally aligns the moments of posterior distributions to mitigate catastrophic forgetting in sequential tasks.
IMM employs two variants, Mean-IMM and Mode-IMM, integrating weight transfer, L2 regularization, and drop-transfer techniques to balance new and old knowledge.
Experimental results on datasets like MNIST, CIFAR-10, and ImageNet show that IMM outperforms methods such as EWC and LwF in preserving task performance.

Incremental Moment Matching for Catastrophic Forgetting in Neural Networks

The paper "Overcoming Catastrophic Forgetting by Incremental Moment Matching" proposes a novel approach to address the issue of catastrophic forgetting in neural networks. Catastrophic forgetting occurs when a neural network loses the ability to retain information from previous tasks after being trained on new tasks. This problem is particularly challenging in the context of continual learning, where the model should integrate knowledge from various sequential tasks.

Key Contributions

The authors introduce Incremental Moment Matching (IMM), a method grounded in Bayesian neural network principles. IMM aims to mitigate catastrophic forgetting by incrementally aligning the moments of the posterior distribution across tasks. By doing so, it maintains a balance of information between the parameters optimized for different tasks.

IMM is manifested in two variants:

Mean-IMM: Utilizes the mean of the posterior distributions by averaging parameters from different tasks.
Mode-IMM: Employs a more complex approach using a Laplacian approximation to merge parameters, focusing on the modes of the mixture of Gaussian distributions.

Both approaches incorporate diverse transfer learning techniques, such as weight transfer, L2-norm regularization, and a novel drop-transfer variant, which ensures a smoother transition between task-specific posteriors.

Experimental Results

The authors conduct extensive evaluations on multiple datasets, including MNIST variations, CIFAR-10, ImageNet, CUB, and real-world Lifelog data. IMM demonstrates state-of-the-art performance, significantly outperforming baseline methods such as Elastic Weight Consolidation (EWC) and Learning without Forgetting (LwF).

In disjoint MNIST experiments, the use of weight-transfer combined with IMM leads to accuracies exceeding 90%, a substantial improvement over traditional SGD based methods.
When tested on shuffled MNIST datasets, IMM maintains robust performance, similar to that of EWC in situations where EWC excels.
Mode-IMM, particularly with the drop-transfer method, performs exceptionally well, restoring task performance even in challenging conditions.

Implications and Future Research

This paper presents a promising direction for continual learning frameworks, showcasing how Bayesian approaches and transfer learning techniques can be integrated effectively to address catastrophic forgetting. The results suggest that IMM offers a flexible mechanism to adjust information priority between older and newer tasks, a practical feature for dynamic, real-world applications.

The research opens avenues for exploring more intricate Gaussian priors and covariance structures in Bayesian neural networks to enhance task retention. Additionally, future investigations could focus on scaling these methods to more complex models and diverse datasets, further solidifying IMM’s practicality and broad applicability in artificial intelligence.

In conclusion, the contributions of IMM, with its principled Bayesian grounding and innovative transfer techniques, mark a significant step forward in the pursuit of robust continual learning mechanisms.

PDF Markdown

Related Papers

YouTube

Show All Videos