Barzilai-Borwein Step Size for Stochastic Gradient Descent (1605.04131v2)

Published 13 May 2016 in math.OC, cs.LG, and stat.ML

Abstract: One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization algorithms, the common practice in SGD is either to use a diminishing step size, or to tune a fixed step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result is missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.

Citations (176)

View on Semantic Scholar

Summary

The paper introduces a novel automatic BB step-size computation for SGD and SVRG, eliminating manual tuning in optimization.
It details a smoothing mechanism to stabilize variance in stochastic gradients, ensuring reliable convergence across iterations.
Empirical and theoretical analyses validate linear convergence for strongly convex functions, enhancing optimization efficiency in ML applications.

Barzilai-Borwein Step Size for Stochastic Gradient Descent

The paper discusses an advancement in the optimization domain for machine learning, specifically addressing step-size determination in Stochastic Gradient Descent (SGD) and its variant Stochastic Variance Reduced Gradient (SVRG) methods. The authors propose leveraging the Barzilai-Borwein (BB) method to automatically compute step sizes, offering new algorithms: SGD-BB and SVRG-BB. The fundamental issue tackled here is the inefficiency associated with manually tuning the step sizes in traditional SGD approaches, a process which can be both time-consuming and suboptimal.

BB Method Application and Linear Convergence

The Barzilai-Borwein approach, originally developed for nonlinear optimization problems, circumvents the typical step-size tuning by dynamically computing the step size based on iteration gradients. In deterministic settings, the BB step sizes are computed to minimize the residual of the secant equation, effectively optimizing the next move in the solution space. Extending this to stochastic contexts, the authors provide a detailed mechanism in the SVRG-BB algorithm that integrates the BB step size into SVRG iterations. They demonstrate theoretically that SVRG-BB achieves linear convergence for strongly convex functions, a notable improvement, validating not only the performance of SVRG-BB but also proving the linear convergence properties of SVRG under Option I, an analysis that was previously unestablished.

Numerical Performance and Step-Size Optimization

The paper includes comprehensive numerical experiments, where both SGD-BB and SVRG-BB were evaluated against traditional implementations on logistic regression and support vector machine problems. These experiments utilized real data sets, showing that the BB-enhanced algorithms performed equally or superiorly compared to conventional methods with best-tuned step sizes, thereby affirming the efficacy of the BB computed step sizes.

Addressing Variance and BB Step Incorporation

A key concern in applying stochastic methods is handling variance appropriately. SGD inherently suffers from variance that affects convergence rates negatively, and addressing this, the authors propose a smoothing technique for step sizes in SGD-BB to stabilize computations over epochs. The smoothing is based on estimating step size decay using an adaptive model, ensuring convergence without excessive fluctuation. Additionally, the paper outlines how the BB step-size computation is generalized to incorporate other SGD variants like the stochastic average gradient (SAG) method, which further refines its utility across various stochastic optimization problems.

Implications and Future Direction

The proposed methods have significant implications for machine learning, particularly in the field of large-scale data processing where optimization efficiency is crucial. Automating the step-size computation reduces the need for manual intervention and model-specific tuning, allowing for broader applicability in diverse settings. The convergence proofs suggest robustness which can be extended in future work, including potential application to nonconvex optimization landscapes. Exploring other quasi-Newton methods and integrating them into stochastic frameworks could further enhance algorithmic performance, predictive accuracy, and computational feasibility.

In summary, the introduction of the Barzilai-Borwein step-size mechanism into the stochastic gradient context represents a meaningful advancement in algorithm optimization, addressing longstanding challenges in step-size tuning and variance handling. The positive empirical results underscore the practical viability of these techniques, marking them as valuable tools in the arsenal of optimization methodologies for machine learning applications.

PDF Markdown