Corridor Geometry in Gradient-Based Optimization
Abstract: We characterize regions of a loss surface as corridors when the continuous curves of steepest descent -- the solutions of the gradient flow -- become straight lines. We show that corridors provide insights into gradient-based optimization, since corridors are exactly the regions where gradient descent and the gradient flow follow the same trajectory, while the loss decreases linearly. As a result, inside corridors there are no implicit regularization effects or training instabilities that have been shown to occur due to the drift between gradient descent and the gradient flow. Using the loss linear decrease on corridors, we devise a learning rate adaptation scheme for gradient descent; we call this scheme Corridor Learning Rate (CLR). The CLR formulation coincides with a special case of Polyak step-size, discovered in the context of convex optimization. The Polyak step-size has been shown recently to have also good convergence properties for neural networks; we further confirm this here with results on CIFAR-10 and ImageNet.
- Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
- Implicit gradient regularization. In ICLR, 2021.
- Optimization methods for large-scale machine learning. SIAM Review, 60(2), 2018.
- On the implicit bias of adam. arXiv:2309.00079, 2023.
- LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
- Gradient descent on neural networks typically occurs at the edge of stability. In ICLR, 2021.
- Mechanic: A learning rate tuner. arXiv:2306.00144, 2023.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- Continuous vs. discrete optimization of deep neural networks. In NeurIPS, 2021.
- Implicit regularization in heavy-ball momentum accelerated stochastic gradient descent. ICLR, 2023.
- Implicit regularization in matrix factorization. NeurIPS, 2017.
- Geometric numerical integration. Oberwolfach Reports, 3(1):805–882, 2006.
- Revisiting the polyak step size. arXiv:1905.00313, 2019.
- Deep residual learning for image recognition. In CVPR, 2016.
- Bag of tricks for image classification with convolutional neural networks. In CVPR, 2019.
- Densely connected convolutional networks. In CVPR, 2017.
- Dog is sgd’s best friend: A parameter-free dynamic step size schedule. arXiv:2302.12022, 2023.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. https://www.cs.toronto.edu/Â kriz/learning-features-2009-TR.pdf, 2009.
- Symmetry, conservation laws, and learning dynamics in neural networks. In ICLR, 2021.
- Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence. In AISTATS, 2021.
- Sgdr: Stochastic gradient descent with warm restarts. In ICLR, 2017.
- Beyond the quadratic approximation: the multiscale structure of neural network loss landscapes, 2022.
- Taiki Miyagawa. Toward equation of motion for deep neural networks: Continuous-time gradient descent and discretization error analysis. In NeurIPS, 2022.
- Dynamics of sgd with stochastic polyak stepsizes: Truly adaptive variants and convergence to exact solution. In NeurIPS, 2022.
- Boris T Polyak. Some methods of speeding up the convergence of iteration methods. Ussr computational mathematics and mathematical physics, 4(5):1–17, 1964.
- Boris T Polyak. Introduction to optimization. optimization software. Inc., Publications Division, New York, 1:32, 1987.
- Implicit regularisation in stochastic gradient descent: from single-objective to two-player games. arXiv:2307.05789, 2023.
- Discretization drift in two-player games. In ICML, 2021.
- On a continuous time model of gradient descent dynamics and instability in deep learning. In TMLR, 2023.
- Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- On the origin of implicit regularization in stochastic gradient descent. In ICLR, 2021.
- The implicit bias of gradient descent on separable data. JMLR, 2018.
- Ruled surface. Wikipedia, 2004. URL https://en.wikipedia.org/wiki/Ruled_surface.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.