Global Convergence of Online Limited Memory BFGS (1409.2045v1)

Published 6 Sep 2014 in math.OC, cs.LG, and stat.ML

Abstract: Global convergence of an online (stochastic) limited memory version of the Broyden-Fletcher- Goldfarb-Shanno (BFGS) quasi-Newton method for solving optimization problems with stochastic objectives that arise in large scale machine learning is established. Lower and upper bounds on the Hessian eigenvalues of the sample functions are shown to suffice to guarantee that the curvature approximation matrices have bounded determinants and traces, which, in turn, permits establishing convergence to optimal arguments with probability 1. Numerical experiments on support vector machines with synthetic data showcase reductions in convergence time relative to stochastic gradient descent algorithms as well as reductions in storage and computation relative to other online quasi-Newton methods. Experimental evaluation on a search engine advertising problem corroborates that these advantages also manifest in practical applications.

Citations (163)

View on Semantic Scholar

Summary

The paper establishes global convergence guarantees with probability 1 for Online Limited Memory BFGS (oLBFGS) in stochastic optimization problems.
oLBFGS adapts LBFGS for stochastic settings by using a limited window of gradient information and controlling curvature estimates for numerical stability.
Numerical experiments show oLBFGS achieves comparable theoretical convergence rates to SGD but demonstrates superior performance on real-world datasets by reducing convergence time and data requirements.

Global Convergence of Online Limited Memory BFGS: An Overview

The paper "Global Convergence of Online Limited Memory BFGS" by Aryan Mokhtari and Alejandro Ribeiro addresses the application of the limited memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method in solving stochastic optimization problems, particularly focusing on scenarios prevalent in large-scale machine learning tasks. The primary contribution of this paper is the establishment of global convergence guarantees for an online (stochastic) limited memory variant of BFGS, referred to as oLBFGS. This paper offers both theoretical insights and practical implications of using oLBFGS in optimization problems where sample functions exhibit stochasticity and involve high-dimensional datasets.

The motivation behind this work is the well-known challenge posed by the computation and storage constraints associated with large-scale optimization problems. Stochastic Gradient Descent (SGD) and its variants are typically employed owing to their computational efficiency; however, they often exhibit slow convergence rates on ill-conditioned functions. Conversely, Newton's method offers rapid convergence in such contexts but entails prohibitive computational costs due to Hessian evaluations. Quasi-Newton methods like BFGS present an intermediate approach, accelerating convergence without direct Hessian computations, yet the traditional form of these methods is not readily applicable to stochastic settings.

The authors focus on the limited memory BFGS (LBFGS), which approximates the Hessian inverse using recent curvature information to curb computational and storage demands. They introduce a stochastic adaptation of LBFGS, ensuring that only a recent window of gradient information is used, allowing for significant reductions in memory and computational requirements per iteration. Critical to this adaptation are the stochastic curvature updates, which can potentially lead to numerical instability. The authors tackle this challenge by incorporating theoretical limits on the curvature estimates' determinants and traces, thus controlling the condition number of the approximation matrices.

Remarkably, the paper provides rigorous mathematical proof that the oLBFGS algorithm converges with probability 1 to the optimal argument. This probabilistic guarantee is crucial as it assures researchers and practitioners of the robustness of the method against the variability induced by stochastic gradients. Furthermore, the authors establish that the convergence rate of oLBFGS is at least $O(1/t)$ in expectation, aligning it with the best achievable rates for first-order stochastic methods like SGD.

The practical effectiveness of oLBFGS is evidenced through numerical experiments on support vector machine (SVM) problems involving synthetic data. These experiments demonstrate a reduction in both convergence time and computational cost compared to SGD, and other quasi-Newton methods, validating the theoretical benefits outlined in the paper. The paper also extends this evaluation to a real-world setting involving a search engine advertising problem. Here, oLBFGS drastically reduces the amount of data needed to train a logistic regressor, showing that it can outperform SGD by a significant margin in practical applications.

The implications of this research are manifold. Theoretically, it advances the understanding of quasi-Newton methods in stochastic optimization, precisely delineating conditions for global convergence. Practically, it equips practitioners in machine learning and related fields with a robust algorithm to deal with large-scale data, where traditional methods struggle with efficiency and convergence speed.

Looking forward, the development of oLBFGS represents a step towards more efficient optimization algorithms that manage the trade-offs between convergence speed, computational cost, and memory usage in data-intensive applications. Future research could extend these principles to more complex models and further integrate adaptive mechanisms to enrich the applicability of quasi-Newton methods in various domains.

Global Convergence of Online Limited Memory BFGS (1409.2045v1)

Summary

Global Convergence of Online Limited Memory BFGS: An Overview

Related Papers