Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hard ASH: Sparsity and the right optimizer make a continual learner

Published 26 Apr 2024 in cs.LG and cs.CV | (2404.17651v1)

Abstract: In class incremental learning, neural networks typically suffer from catastrophic forgetting. We show that an MLP featuring a sparse activation function and an adaptive learning rate optimizer can compete with established regularization techniques in the Split-MNIST task. We highlight the effectiveness of the Adaptive SwisH (ASH) activation function in this context and introduce a novel variant, Hard Adaptive SwisH (Hard ASH) to further enhance the learning retention.

Authors (1)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. How can we be so dense? the benefits of using highly sparse representations, 2019.
  2. Memory aware synapses: Learning what (not) to forget, 2018.
  3. On warm-starting neural network training, 2020.
  4. Sparse distributed memory is a continual learner, 2023.
  5. Binaryconnect: Training deep neural networks with binary weights during propagations, 2016.
  6. Continual backprop: Stochastic gradient descent with persistent randomness, 2022.
  7. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
  8. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, 2015.
  9. Re-evaluating continual learning scenarios: A categorization and case for strong baselines, 2019.
  10. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014. URL https://api.semanticscholar.org/CorpusID:6628106.
  11. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, March 2017. ISSN 1091-6490. doi: 10.1073/pnas.1611835114. URL http://dx.doi.org/10.1073/pnas.1611835114.
  12. Elephant neural networks: Born to be a continual learner, 2023.
  13. Stochastic adaptive activation function, 2022.
  14. k-sparse autoencoders, 2014.
  15. A practical sparse approximation for real time recurrent learning, 2020.
  16. The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects. Frontiers in Psychology, 4, 2013. ISSN 1664-1078. doi: 10.3389/fpsyg.2013.00504. URL https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00504.
  17. Weight normalization: A simple reparameterization to accelerate training of deep neural networks, 2016.
  18. Descending through a crowded valley - benchmarking deep learning optimizers, 2021.
  19. Progress & compress: A scalable framework for continual learning, 2018.
  20. Algorithmic insights on continual learning from fruit flies, 2021.
  21. Compete to compute. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper_files/paper/2013/file/8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf.
  22. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, 2012.
  23. Enhancing adversarial defense by k-winners-take-all, 2019.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.