Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Step-size Optimization for Continual Learning (2401.17401v1)

Published 30 Jan 2024 in cs.LG and cs.AI

Abstract: In continual learning, a learner has to keep learning from the data over its whole life time. A key issue is to decide what knowledge to keep and what knowledge to let go. In a neural network, this can be implemented by using a step-size vector to scale how much gradient samples change network weights. Common algorithms, like RMSProp and Adam, use heuristics, specifically normalization, to adapt this step-size vector. In this paper, we show that those heuristics ignore the effect of their adaptation on the overall objective function, for example by moving the step-size vector away from better step-size vectors. On the other hand, stochastic meta-gradient descent algorithms, like IDBD (Sutton, 1992), explicitly optimize the step-size vector with respect to the overall objective function. On simple problems, we show that IDBD is able to consistently improve step-size vectors, where RMSProp and Adam do not. We explain the differences between the two approaches and their respective limitations. We conclude by suggesting that combining both approaches could be a promising future direction to improve the performance of neural networks in continual learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation, 12(6):1399–1409.
  2. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems, volume 29.
  3. Exact natural gradient in deep linear networks and its application to the nonlinear case. In Advances in Neural Information Processing Systems, pages 5945–5954.
  4. Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in Neural Information Processing Systems, 27.
  5. Natural neural networks. In Advances in Neural Information Processing Systems, pages 2071–2079.
  6. French, R. M. (1999). Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135.
  7. RMSProp: Divide the gradient by a running average of its recent magnitude. Coursera Neural Networks for Machine Learning, 6e:26–31.
  8. Meta-descent for online, continual prediction.
  9. Accelerating stochastic gradient descent using predictive variance reduction. Advances in Neural Information Processing Systems, 26.
  10. Learning feature relevance through step size adaptation in temporal-difference learning. CoRR, abs/1903.03252.
  11. Markerless tracking of complex human motions from multiple views. Computer Vision and Image Understanding, 104(2):190–209.
  12. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR.
  13. Algorithms for optimization. The MIT Press, Cambridge, MA.
  14. Learning the learning rate for prediction with expert advice. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., and Weinberger, K., editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc.
  15. Koop, A. (2008). Investigating Experience: Temporal Coherence and Empirical Knowledge Representation. PhD thesis, University of Alberta.
  16. Tuning-free step-size adaptation. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, Kyoto, Japan, March 25-30, 2012, pages 2121–2124. IEEE.
  17. A fast natural newton method. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pages 623–630. Omnipress.
  18. No more pesky learning rates. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, volume 28 of JMLR Workshop and Conference Proceedings, pages 343–351. JMLR.org.
  19. Schraudolph, N. N. (1999). Online Learning with Adaptive Local Step Sizes. In Neural Nets — WIRN Vietri-99: Proc. 11t⁢h𝑡ℎ{}^{th}start_FLOATSUPERSCRIPT italic_t italic_h end_FLOATSUPERSCRIPT Italian Workshop on Neural Networks, Perspectives in Neural Computing, pages 151–156, Vietri sul Mare, Salerno, Italy. Springer Verlag, Berlin.
  20. Online Independent Component Analysis With Local Learning Rate Adaptation. In Advances in Neural Information Processing Systems, volume 12, pages 789–795. The MIT Press, Cambridge, MA.
  21. Sutton, R. S. (1981). Adaptation of learning rate parameters. In Goal Seeking Components for Adaptive Intelligence: An Initial Assessment.
  22. Sutton, R. S. (1992). Adapting bias by gradient descent: An incremental version of delta-bar-delta. In Proceedings of the 10th National Conference on Artificial Intelligence, pages 171–176. AAAI Press / The MIT Press.
  23. Sutton, R. S. (2022). A history of meta-gradient: Gradient methods for meta-learning.
  24. On the role of tracking in stationary environments. In Proceedings of the 24th international conference on Machine learning, pages 871–878.
  25. Thill, M. (2015). Temporal difference learning methods with automatic step-size adaption for strategic board games: Connect-4 and dots-and-boxes. Cologne University of Applied Sciences Masters thesis.
  26. Metagrad: Multiple learning rates in online learning. CoRR, abs/1604.08740.
  27. Wngrad: Learn the learning rate in gradient descent.
  28. Meta-gradient reinforcement learning with an objective discovered online. In Advances in Neural Information Processing Systems, volume 33, pages 15254–15264.
  29. Meta-gradient reinforcement learning. Advances in Neural Information Processing Systems, 31.
  30. Metatrace: Online step-size tuning by meta-gradient descent for reinforcement learning control. CoRR, abs/1805.04514.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com