Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automating Continual Learning (2312.00276v2)

Published 1 Dec 2023 in cs.LG

Abstract: General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments. Conventional learning algorithms for neural networks, however, suffer from catastrophic forgetting (CF) -- previously acquired skills are forgotten when a new task is learned. Instead of hand-crafting new algorithms for avoiding CF, we propose Automated Continual Learning (ACL) to train self-referential neural networks to meta-learn their own in-context continual (meta-)learning algorithms. ACL encodes all desiderata -- good performance on both old and new tasks -- into its meta-learning objectives. Our experiments demonstrate that ACL effectively solves "in-context catastrophic forgetting"; our ACL-learned algorithms outperform hand-crafted ones, e.g., on the Split-MNIST benchmark in the replay-free setting, and enables continual learning of diverse tasks consisting of multiple few-shot and standard image classification datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (109)
  1. Theoretical foundations of potential function method in pattern recognition. Automation and Remote Control, 25(6):917–936, 1964.
  2. What learning algorithm is in-context learning? investigations with linear models. In Int. Conf. on Learning Representations (ICLR), Kigali, Rwanda, May 2023.
  3. Memory aware synapses: Learning what (not) to forget. In Proc. European Conf. on Computer Vision (ECCV), pp.  144–161, Munich, Germany, September 2018.
  4. Generative vs. discriminative: Rethinking the meta-continual learning. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  21592–21604, Virtual only, December 2021.
  5. Learning to continually learn. In Proc. European Conf. on Artificial Intelligence (ECAI), pp.  992–1001, August 2020.
  6. Tom Bosc. Learning to learn neural networks. In NIPS Workshop on Reasoning, Attention, Memory, Montreal, Canada, December 2015.
  7. TaskNorm: Rethinking batch normalization for meta-learning. In Proc. Int. Conf. on Machine Learning (ICML), pp.  1153–1164, Virtual only, 2020.
  8. Tom B Brown et al. Language models are few-shot learners. In Proc. Advances in Neural Information Processing Systems (NeurIPS), Virtual only, December 2020.
  9. Online fast adaptation and knowledge accumulation (OSAKA): a new approach to continual learning. In Proc. Advances in Neural Information Processing Systems (NeurIPS), Virtual only, December 2020.
  10. Rich Caruana. Multitask learning. Machine learning, 28:41–75, 1997.
  11. Transformers generalize differently from information stored in context vs in weights. In NeurIPS Workshop on Memory in Artificial and Real Intelligence (MemARI), New Orleans, LA, USA, November 2022a.
  12. Data distributional properties drive emergent in-context learning in transformers. In Proc. Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, November 2022b.
  13. Rethinking attention with performers. In Int. Conf. on Learning Representations (ICLR), Virtual only, 2021.
  14. Meta-in-context learning in large language models. Preprint arXiv:2305.12907, 2023.
  15. Fixed-weight networks can learn. In Proc. Int. Joint Conf. on Neural Networks (IJCNN), pp.  553–559, San Diego, CA, USA, June 1990.
  16. Learning algorithms and fixed dynamics. In Proc. Int. Joint Conf. on Neural Networks (IJCNN), pp.  799–801, Seattle, WA, USA, July 1991.
  17. The devil is in the detail: Simple tricks improve systematic generalization of transformers. In Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), Punta Cana, Dominican Republic, November 2021.
  18. Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers. In Proc. Findings Association for Computational Linguistics (ACL), pp.  4005–4019, Toronto, Canada, July 2023.
  19. Torchmeta: A meta-learning library for PyTorch. Preprint arXiv:1909.06576, 2019.
  20. RL22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning. Preprint arXiv:1611.02779, 2016.
  21. David Eagleman. Livewired: The inside story of the ever-changing brain. 2020.
  22. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm. In Int. Conf. on Learning Representations (ICLR), Vancouver, Canada, April 2018.
  23. Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. Int. Conf. on Machine Learning (ICML), pp.  1126–1135, Sydney, Australia, August 2017.
  24. Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2):3–71, 1988.
  25. Robert M French. Using semi-distributed representations to overcome catastrophic forgetting in connectionist networks. In Proc. Cognitive science society conference, volume 1, pp.  173–178, 1991.
  26. Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.
  27. Automated curriculum learning for neural networks. In Proc. Int. Conf. on Machine Learning (ICML), pp.  1311–1320, Sydney, Australia, August 2017.
  28. Stephen T Grossberg. Studies of mind and brain: Neural principles of learning, perception, development, cognition, and motor control. Springer, 1982.
  29. Task agnostic continual learning via meta learning. Preprint arXiv:1906.05201, 2019.
  30. Learning to learn using gradient descent. In Proc. Int. Conf. on Artificial Neural Networks (ICANN), volume 2130, pp.  87–94, Vienna, Austria, August 2001.
  31. Re-evaluating continual learning scenarios: A categorization and case for strong baselines. In NeurIPS Workshop on Continual Learning, Montréal, Canada, December 2018.
  32. Are LSTMs good few-shot learners? Machine Learning, pp.  1–28, 2023.
  33. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. Int. Conf. on Machine Learning (ICML), pp.  448–456, Lille, France, July 2015.
  34. Accelerating neural self-improvement via bootstrapping. In ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models, Kigali, Rwanda, May 2023a.
  35. Images as weight matrices: Sequential image generation through synaptic learning rules. In Int. Conf. on Learning Representations (ICLR), Kigali, Rwanda, May 2023b.
  36. Going beyond linear transformers with recurrent fast weight programmers. In Proc. Advances in Neural Information Processing Systems (NeurIPS), Virtual only, December 2021a.
  37. Improving baselines in the wild. In Workshop on Distribution Shifts, NeurIPS, Virtual only, 2021b.
  38. The dual form of neural networks revisited: Connecting test time predictions to training patterns via spotlights of attention. In Proc. Int. Conf. on Machine Learning (ICML), Baltimore, MD, USA, July 2022a.
  39. Neural differential equations for learning to program neural nets through continuous learning rules. In Proc. Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, December 2022b.
  40. A modern self-referential weight matrix that learns to modify itself. In Proc. Int. Conf. on Machine Learning (ICML), pp.  9660–9677, Baltimore, MA, USA, July 2022c.
  41. Practical computational power of linear transformers and their recurrent and self-referential extensions. In Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), Sentosa, Singapore, 2023.
  42. Meta-learning representations for continual learning. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  1818–1828, Vancouver, BC, Canada, December 2019.
  43. Transformers are RNNs: Fast autoregressive transformers with linear attention. In Proc. Int. Conf. on Machine Learning (ICML), Virtual only, July 2020.
  44. Overcoming catastrophic forgetting in neural networks. Proc. National academy of sciences, 114(13):3521–3526, 2017.
  45. Meta learning backpropagation and improving it. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  14122–14134, Virtual only, December 2021.
  46. General-purpose in-context learning by meta-learning transformers. In NeurIPS Workshop on Memory in Artificial and Real Intelligence (MemARI), New Orleans, LA, USA, November 2022.
  47. Chris A Kortge. Episodic memory in connectionist networks. In 12th Annual Conference. CSS Pod, pp.  764–771, 1990.
  48. Alex Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, Computer Science Department, University of Toronto, 2009.
  49. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  50. The MNIST database of handwritten digits. URL http://yann. lecun. com/exdb/mnist, 1998.
  51. Learning without forgetting. In Proc. European Conf. on Computer Vision (ECCV), pp.  614–629, Amsterdam, Netherlands, October 2016.
  52. Gradient episodic memory for continual learning. In Proc. Advances in Neural Information Processing Systems (NIPS), pp.  6467–6476, Long Beach, CA, USA, December 2017.
  53. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological review, 102(3):419, 1995.
  54. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp.  109–165. 1989.
  55. Differentiable plasticity: training plastic neural networks with backpropagation. In Proc. Int. Conf. on Machine Learning (ICML), pp.  3559–3568, Stockholm, Sweden, July 2018.
  56. Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity. In Int. Conf. on Learning Representations (ICLR), New Orleans, LA, USA, May 2019.
  57. Rethinking the role of demonstrations: What makes in-context learning work? In Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp.  11048–11064, Abu Dhabi, UAE, December 2022.
  58. A simple neural attentive meta-learner. In Int. Conf. on Learning Representations (ICLR), Vancouver, Cananda, 2018.
  59. Metalearning with Hebbian fast weights. Preprint arXiv:1807.05076, 2018.
  60. Meta networks. In Proc. Int. Conf. on Machine Learning (ICML), pp.  2554–2563, Sydney, Australia, August 2017.
  61. Metalearned neural memory. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  13310–13321, Vancouver, Canada, December 2019.
  62. Meta-neural networks that learn by learning. In Proc. International Joint Conference on Neural Networks (IJCNN), volume 1, pp.  437–442, Baltimore, MD, USA, June 1992.
  63. TADAM: task dependent adaptive metric for improved few-shot learning. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  719–729, Montréal, Canada, December 2018.
  64. Random feature attention. In Int. Conf. on Learning Representations (ICLR), Virtual only, 2021.
  65. Language models are unsupervised multitask learners. [Online]. : https://blog.openai.com/better-language-models/, 2019.
  66. Roger Ratcliff. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychological review, 97(2):285, 1990.
  67. Optimization as a model for few-shot learning. In Int. Conf. on Learning Representations (ICLR), Toulon, France, April 2017.
  68. Fast and flexible multi-task classification using conditional neural adaptive processes. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  7957–7968, Vancouver, Canada, December 2019.
  69. Learning to learn without forgetting by maximizing transfer and minimizing interference. In Int. Conf. on Learning Representations (ICLR), New Orleans, LA, USA, May 2019.
  70. Mark B. Ring. Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, Austin, TX, USA, 1994.
  71. Anthony Robins. Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Science, 7(2):123–146, 1995.
  72. Experience replay for continual learning. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  348–358, Vancouver, Canada, December 2019.
  73. Progressive neural networks. Preprint arXiv:1606.04671, 2016.
  74. Meta-learning bidirectional update rules. In Proc. Int. Conf. on Machine Learning (ICML), pp.  9288–9300, Virtual only, July 2021.
  75. Meta-learning with memory-augmented neural networks. In Proc. Int. Conf. on Machine Learning (ICML), pp.  1842–1850, New York City, NY, USA, June 2016.
  76. Linear Transformers are secretly fast weight programmers. In Proc. Int. Conf. on Machine Learning (ICML), Virtual only, July 2021.
  77. Jürgen Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook. PhD thesis, Technische Universität München, 1987.
  78. Jürgen Schmidhuber. Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Institut für Informatik, Technische Universität München. Technical Report FKI-126, 90, 1990.
  79. Jürgen Schmidhuber. Learning to control fast-weight memories: An alternative to recurrent nets. Technical Report FKI-147-91, Institut für Informatik, Technische Universität München, March 1991.
  80. Jürgen Schmidhuber. Steps towards “self-referential” learning. Technical Report CU-CS-627-92, Dept. of Comp. Sci., University of Colorado at Boulder, November 1992a.
  81. Jürgen Schmidhuber. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1):131–139, 1992b.
  82. Jürgen Schmidhuber. A self-referential weight matrix. In Proc. Int. Conf. on Artificial Neural Networks (ICANN), pp.  446–451, Amsterdam, Netherlands, September 1993.
  83. Jürgen Schmidhuber. On learning how to learn learning strategies. Technical Report FKI-198-94, Institut für Informatik, Technische Universität München, November 1994.
  84. Jürgen Schmidhuber. Beyond “genetic programming”: Incremental self-improvement. In Proc. Workshop on Genetic Programming at ML95, pp.  42–49, 1995.
  85. Jürgen Schmidhuber. One big net for everything. Preprint arXiv:1802.08864, 2018.
  86. Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28(1):105–130, 1997.
  87. Progress & compress: A scalable framework for continual learning. In Proc. Int. Conf. on Machine Learning (ICML), pp.  4535–4544, Stockholm, Sweden, July 2018.
  88. Continual learning with deep generative replay. In Proc. Advances in Neural Information Processing Systems (NIPS), pp.  2990–2999, Long Beach, CA, USA, December 2017.
  89. Compete to compute. In Proc. Advances in Neural Information Processing Systems (NIPS), pp.  2310–2318, Lake Tahoe, NV, USA, December 2013.
  90. Sebastian Thrun. Lifelong learning algorithms. In Learning to learn, pp.  181–209. 1998.
  91. MLP-Mixer: An all-MLP architecture for vision. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  24261–24272, Virtual only, December 2021.
  92. Meta-dataset: A dataset of datasets for learning to learn from few examples. In Int. Conf. on Learning Representations (ICLR), Addis Ababa, Ethiopia, April 2020.
  93. Instance normalization: The missing ingredient for fast stylization. Preprint arXiv:1607.08022, 2016.
  94. Gido M Van de Ven and Andreas S Tolias. Generative replay with feedback connections as a general strategy for continual learning. Preprint arXiv:1809.10635, 2018a.
  95. Gido M Van de Ven and Andreas S Tolias. Three scenarios for continual learning. In NeurIPS Workshop on Continual Learning, Montréal, Canada, December 2018b.
  96. Attention is all you need. In Proc. Advances in Neural Information Processing Systems (NIPS), pp.  5998–6008, Long Beach, CA, USA, December 2017.
  97. Efficient continual learning with modular networks and task-driven priors. In Int. Conf. on Learning Representations (ICLR), Virtual only, May 2021.
  98. Matching networks for one shot learning. In Proc. Advances in Neural Information Processing Systems (NIPS), pp.  3630–3638, Barcelona, Spain, December 2016.
  99. Transformers learn in-context by gradient descent. In Proc. Int. Conf. on Machine Learning (ICML), Honolulu, HI, USA, July 2023a.
  100. Uncovering mesa-optimization algorithms in Transformers. Preprint arXiv:2309.05858, 2023b.
  101. Learning to reinforcement learn. In Proc. Annual Meeting of the Cognitive Science Society (CogSci), London, UK, July 2017.
  102. Adaptive switching circuits. In Proc. IRE WESCON Convention Record, pp.  96–104, Los Angeles, CA, USA, August 1960.
  103. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. Preprint arXiv:1708.07747, 2017.
  104. An explanation of in-context learning as implicit bayesian inference. In Int. Conf. on Learning Representations (ICLR), Virtual only, April 2022.
  105. Addressing catastrophic forgetting in few-shot problems. In Proc. Int. Conf. on Machine Learning (ICML), pp.  11909–11919, Virtual only, July 2021.
  106. Ground-truth labels matter: A deeper look into input-label demonstrations. In Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp.  2422–2437, Abu Dhabi, UAE, December 2022.
  107. Fixed-weight on-line learning. IEEE Transactions on Neural Networks, 10(2):272–283, 1999.
  108. Continual learning through synaptic intelligence. In Proc. Int. Conf. on Machine Learning (ICML), pp.  3987–3995, Sydney, Australia, August 2017.
  109. A simple but strong baseline for online continual learning: Repeated augmented rehearsal. In Proc. Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, December 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Kazuki Irie (35 papers)
  2. Róbert Csordás (25 papers)
  3. Jürgen Schmidhuber (124 papers)
Citations (3)
Github Logo Streamline Icon: https://streamlinehq.com

HackerNews

  1. Automating Continual Learning (9 points, 0 comments)