2000 character limit reached
Hard ASH: Sparsity and the right optimizer make a continual learner
Published 26 Apr 2024 in cs.LG and cs.CV | (2404.17651v1)
Abstract: In class incremental learning, neural networks typically suffer from catastrophic forgetting. We show that an MLP featuring a sparse activation function and an adaptive learning rate optimizer can compete with established regularization techniques in the Split-MNIST task. We highlight the effectiveness of the Adaptive SwisH (ASH) activation function in this context and introduce a novel variant, Hard Adaptive SwisH (Hard ASH) to further enhance the learning retention.
- How can we be so dense? the benefits of using highly sparse representations, 2019.
- Memory aware synapses: Learning what (not) to forget, 2018.
- On warm-starting neural network training, 2020.
- Sparse distributed memory is a continual learner, 2023.
- Binaryconnect: Training deep neural networks with binary weights during propagations, 2016.
- Continual backprop: Stochastic gradient descent with persistent randomness, 2022.
- Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, 2015.
- Re-evaluating continual learning scenarios: A categorization and case for strong baselines, 2019.
- Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014. URL https://api.semanticscholar.org/CorpusID:6628106.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, March 2017. ISSN 1091-6490. doi: 10.1073/pnas.1611835114. URL http://dx.doi.org/10.1073/pnas.1611835114.
- Elephant neural networks: Born to be a continual learner, 2023.
- Stochastic adaptive activation function, 2022.
- k-sparse autoencoders, 2014.
- A practical sparse approximation for real time recurrent learning, 2020.
- The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects. Frontiers in Psychology, 4, 2013. ISSN 1664-1078. doi: 10.3389/fpsyg.2013.00504. URL https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00504.
- Weight normalization: A simple reparameterization to accelerate training of deep neural networks, 2016.
- Descending through a crowded valley - benchmarking deep learning optimizers, 2021.
- Progress & compress: A scalable framework for continual learning, 2018.
- Algorithmic insights on continual learning from fruit flies, 2021.
- Compete to compute. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper_files/paper/2013/file/8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf.
- Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, 2012.
- Enhancing adversarial defense by k-winners-take-all, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.