Papers
Topics
Authors
Recent
Search
2000 character limit reached

Corridor Geometry in Gradient-Based Optimization

Published 13 Feb 2024 in stat.ML, cs.LG, and math.OC | (2402.08818v1)

Abstract: We characterize regions of a loss surface as corridors when the continuous curves of steepest descent -- the solutions of the gradient flow -- become straight lines. We show that corridors provide insights into gradient-based optimization, since corridors are exactly the regions where gradient descent and the gradient flow follow the same trajectory, while the loss decreases linearly. As a result, inside corridors there are no implicit regularization effects or training instabilities that have been shown to occur due to the drift between gradient descent and the gradient flow. Using the loss linear decrease on corridors, we devise a learning rate adaptation scheme for gradient descent; we call this scheme Corridor Learning Rate (CLR). The CLR formulation coincides with a special case of Polyak step-size, discovered in the context of convex optimization. The Polyak step-size has been shown recently to have also good convergence properties for neural networks; we further confirm this here with results on CIFAR-10 and ImageNet.

Authors (2)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
  2. Implicit gradient regularization. In ICLR, 2021.
  3. Optimization methods for large-scale machine learning. SIAM Review, 60(2), 2018.
  4. On the implicit bias of adam. arXiv:2309.00079, 2023.
  5. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  6. Gradient descent on neural networks typically occurs at the edge of stability. In ICLR, 2021.
  7. Mechanic: A learning rate tuner. arXiv:2306.00144, 2023.
  8. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  9. Continuous vs. discrete optimization of deep neural networks. In NeurIPS, 2021.
  10. Implicit regularization in heavy-ball momentum accelerated stochastic gradient descent. ICLR, 2023.
  11. Implicit regularization in matrix factorization. NeurIPS, 2017.
  12. Geometric numerical integration. Oberwolfach Reports, 3(1):805–882, 2006.
  13. Revisiting the polyak step size. arXiv:1905.00313, 2019.
  14. Deep residual learning for image recognition. In CVPR, 2016.
  15. Bag of tricks for image classification with convolutional neural networks. In CVPR, 2019.
  16. Densely connected convolutional networks. In CVPR, 2017.
  17. Dog is sgd’s best friend: A parameter-free dynamic step size schedule. arXiv:2302.12022, 2023.
  18. Adam: A method for stochastic optimization. In ICLR, 2015.
  19. Alex Krizhevsky. Learning multiple layers of features from tiny images. https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf, 2009.
  20. Symmetry, conservation laws, and learning dynamics in neural networks. In ICLR, 2021.
  21. Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence. In AISTATS, 2021.
  22. Sgdr: Stochastic gradient descent with warm restarts. In ICLR, 2017.
  23. Beyond the quadratic approximation: the multiscale structure of neural network loss landscapes, 2022.
  24. Taiki Miyagawa. Toward equation of motion for deep neural networks: Continuous-time gradient descent and discretization error analysis. In NeurIPS, 2022.
  25. Dynamics of sgd with stochastic polyak stepsizes: Truly adaptive variants and convergence to exact solution. In NeurIPS, 2022.
  26. Boris T Polyak. Some methods of speeding up the convergence of iteration methods. Ussr computational mathematics and mathematical physics, 4(5):1–17, 1964.
  27. Boris T Polyak. Introduction to optimization. optimization software. Inc., Publications Division, New York, 1:32, 1987.
  28. Implicit regularisation in stochastic gradient descent: from single-objective to two-player games. arXiv:2307.05789, 2023.
  29. Discretization drift in two-player games. In ICML, 2021.
  30. On a continuous time model of gradient descent dynamics and instability in deep learning. In TMLR, 2023.
  31. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
  32. On the origin of implicit regularization in stochastic gradient descent. In ICLR, 2021.
  33. The implicit bias of gradient descent on separable data. JMLR, 2018.
  34. Ruled surface. Wikipedia, 2004. URL https://en.wikipedia.org/wiki/Ruled_surface.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 4 likes about this paper.