Improving Line Search Methods for Large Scale Neural Network Training (2403.18519v1)

Published 27 Mar 2024 in cs.LG and cs.AI

Abstract: In recent studies, line search methods have shown significant improvements in the performance of traditional stochastic gradient descent techniques, eliminating the need for a specific learning rate schedule. In this paper, we identify existing issues in state-of-the-art line search methods, propose enhancements, and rigorously evaluate their effectiveness. We test these methods on larger datasets and more complex data domains than before. Specifically, we improve the Armijo line search by integrating the momentum term from ADAM in its search direction, enabling efficient large-scale training, a task that was previously prone to failure using Armijo line search methods. Our optimization approach outperforms both the previous Armijo implementation and tuned learning rate schedules for Adam. Our evaluation focuses on Transformers and CNNs in the domains of NLP and image data. Our work is publicly available as a Python package, which provides a hyperparameter free Pytorch optimizer.

References (22)

Citations (1)

View on Semantic Scholar

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Improving Line Search Methods for Large Scale Neural Network Training (2403.18519v1)

Collections

Summary

Follow-up Questions

Authors (3)

Tweets

Improving Line Search Methods for Large Scale Neural Network Training (2403.18519v1)

Collections

Summary

Follow-up Questions

Related Papers

Authors (3)

Tweets