Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Modulate Random Weights: Neuromodulation-inspired Neural Networks For Efficient Continual Learning (2204.04297v2)

Published 8 Apr 2022 in cs.LG and cs.CV

Abstract: Existing Continual Learning (CL) approaches have focused on addressing catastrophic forgetting by leveraging regularization methods, replay buffers, and task-specific components. However, realistic CL solutions must be shaped not only by metrics of catastrophic forgetting but also by computational efficiency and running time. Here, we introduce a novel neural network architecture inspired by neuromodulation in biological nervous systems to economically and efficiently address catastrophic forgetting and provide new avenues for interpreting learned representations. Neuromodulation is a biological mechanism that has received limited attention in machine learning; it dynamically controls and fine tunes synaptic dynamics in real time to track the demands of different behavioral contexts. Inspired by this, our proposed architecture learns a relatively small set of parameters per task context that \emph{neuromodulates} the activity of unchanging, randomized weights that transform the input. We show that this approach has strong learning performance per task despite the very small number of learnable parameters. Furthermore, because context vectors are so compact, multiple networks can be stored concurrently with no interference and little spatial footprint, thus completely eliminating catastrophic forgetting and accelerating the training process.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Informing deep neural networks by multiscale principles of neuromodulatory systems. Trends Neurosci., 2022.
  2. Sebastian Thrun. A lifelong learning perspective for mobile robot control. In Intelligent robots and systems, pages 201–214. Elsevier, 1995.
  3. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev., 102(3):419, 1995.
  4. Continual learning in low-rank orthogonal subspaces. Advances in Neural Information Processing Systems, 33:9900–9911, 2020.
  5. Continual learning with node-importance based adaptive group sparse regularization. Advances in Neural Information Processing Systems, 33:3647–3658, 2020.
  6. Linear mode connectivity in multitask and continual learning. arXiv preprint arXiv:2010.04495, 2020.
  7. Continual learning with tiny episodic memories. In Proceedings of the 36th International Conference on Machine Learning (PMLR 97), 2019.
  8. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019.
  9. Gradient projection memory for continual learning. arXiv preprint arXiv:2103.09762, 2021.
  10. Superposition of many models into one. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), volume 32, 2019.
  11. Batchensemble: an alternative approach to efficient ensemble and lifelong learning. arXiv preprint arXiv:2002.06715, 2020.
  12. Supermasks in superposition. Advances in Neural Information Processing Systems, 33:15173–15184, 2020.
  13. Forget-free continual learning with winning subnetworks. In International Conference on Machine Learning, pages 10734–10750. PMLR, 2022.
  14. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
  15. The drosophila mushroom body: from architecture to algorithm in a learning circuit. Annu. Rev. Neurosci., 43:465–484, 2020.
  16. Introducing neuromodulation in deep neural networks to learn adaptive behaviours. PloS ONE, 15(1):e0227922, 2020.
  17. Neuromodulators generate multiple context-relevant behaviors in a recurrent neural network by shifting activity hypertubes. bioRxiv, pages 2021–05, 2022.
  18. Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity. arXiv:2002.10585, 2020.
  19. Learning to continually learn. arXiv:2002.09571, 2020.
  20. Exploring neuromodulation for dynamic learning. Front. Neurosci., page 928, 2020.
  21. An insect-inspired randomly, weighted neural network with random fourier features for neuro-symbolic relational learning. In Proceedings of the 15th International Workshop on Neuro-Symbolic Learning and Reasoning (NeSy 20/21), October 25–27, 2021.
  22. Representing prior knowledge using randomly, weighted feature networks for visual relationship detection. arXiv:2111.10686, 2021.
  23. Pentti Kanerva. Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognit. Comput., 1(2):139–159, 2009.
  24. Yann LeCun. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
  25. Overcoming catastrophic forgetting in neural networks. PNAS USA, 114(13):3521–3526, 2017.
  26. Alex Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto, The address of the publisher, 2009.
  27. Matching networks for one shot learning. Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), 29, 2016.
  28. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  29. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  30. A neural algorithm for a fundamental computing problem. Science, 358(6364):793–796, 2017.
  31. A neural data structure for novelty detection. PNAS USA, 115(51):13093–13098, 2018.
  32. Can a fruit fly learn word embeddings? arXiv:2101.06887, 2021.
  33. Fruit-fly inspired neighborhood encoding for classification. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1470–1480, 2021.
  34. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2):26–31, 2012.
  35. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  36. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jinyung Hong (9 papers)
  2. Theodore P. Pavlic (15 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.