Task Agnostic Continual Learning Using Online Variational Bayes (1803.10123v3)

Published 27 Mar 2018 in stat.ML and cs.LG

Abstract: Catastrophic forgetting is the notorious vulnerability of neural networks to the change of the data distribution while learning. This phenomenon has long been considered a major obstacle for allowing the use of learning agents in realistic continual learning settings. A large body of continual learning research assumes that task boundaries are known during training. However, research for scenarios in which task boundaries are unknown during training has been lacking. In this paper we present, for the first time, a method for preventing catastrophic forgetting (BGD) for scenarios with task boundaries that are unknown during training --- task-agnostic continual learning. Code of our algorithm is available at https://github.com/igolan/bgd.

View on arXiv

Authors (4)

Chen Zeno (4 papers)
Itay Golan (5 papers)
Elad Hoffer (23 papers)
Daniel Soudry (76 papers)

Citations (108)

View on Semantic Scholar

Summary

Task Agnostic Continual Learning Using Online Variational Bayes

Continual learning faces a persistent challenge known as catastrophic forgetting, where neural networks fail to retain information from previously learned tasks as they adapt to new ones. Historically, many strategies targeting continual learning assume explicit task boundaries during training. The paper under review presents a novel approach, titled Bayesian Gradient Descent (BGD), designed for scenarios devoid of defined task boundaries – termed task-agnostic continual learning. This marks a significant stride in accommodating realistic conditions where task transitions occur subtly, without clear demarcations.

Core Methodology

The approach is grounded in Bayesian inference, utilizing an online version of variational Bayes to approximate posterior distributions. BGD uniquely updates this posterior distribution through a closed-form update rule, facilitating task-agnostic application. The algorithm operates independently of task-specific information, maintaining robust performance across varied tasks. It adapts the traditional mean-field approximation by leveraging Gaussian parameterization, allowing for tractable inference amidst ongoing changes in data distribution.

Experimental Insights

Empirical evaluations were conducted across both discrete and continuous task agnostic scenarios using popular datasets like Permuted MNIST and Split MNIST. BGD demonstrated competitive results, effectively reducing the impact of catastrophic forgetting and achieving commendable accuracy against benchmark methods like Synaptic Intelligence (SI) and Elastic Weight Consolidation (EWC). Notably, BGD maintained accuracy even when task boundaries were ambiguous, reinforcing its utility in more complex, task-agnostic environments.

Further experimentation deploying the "labels trick"—an insightful enhancement—showed considerable improvements in scenarios demanding task identity inference during testing (class learning). This auxiliary method trains only the heads corresponding to the labels present in the current batch, optimizing accuracy by reducing unnecessary updates on irrelevant heads.

Theoretical and Practical Implications

The algorithm capitalizes on the intrinsic connections between variational inference and natural gradient optimization to seamlessly adjust learning rates based on uncertainty metrics. As tasks evolve, the learning rates adapt, allowing the network to focus on less certain weights—typically indicative of less importance to previous tasks—and thus retaining essential knowledge.

Looking forward, the task-agnostic continual learning approach sets the foundation for developing neural networks capable of operating seamlessly in dynamic environments, reflecting more realistic applications. Future research could delve into non-diagonal or more sophisticated priors, potentially unlocking enhanced adaptability and efficiency in BGD.

Conclusion

The presented BGD algorithm for task-agnostic continual learning reiterates the relevance of Bayesian methods in mitigating catastrophic forgetting. This approach not only advances the theoretical framework surrounding continual learning by applying variational Bayes in an online setting but also lays groundwork for substantial practical improvements in AI systems facing continuously evolving tasks. The combination of a robust theoretical underpinning with empirical validation underscores BGD’s potential as an effective continual learning strategy.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - igolan/bgd: Implementation of Bayesian Gradient Descent (37 stars)