Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) (1306.2119v1)

Published 10 Jun 2013 in cs.LG, math.OC, and stat.ML

Abstract: We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for function values of O(1/n^{1/2}). We consider and analyze two algorithms that achieve a rate of O(1/n) for classical supervised learning problems. For least-squares regression, we show that averaged stochastic gradient descent with constant step-size achieves the desired rate. For logistic regression, this is achieved by a simple novel stochastic gradient algorithm that (a) constructs successive local quadratic approximations of the loss functions, while (b) preserving the same running time complexity as stochastic gradient descent. For these algorithms, we provide a non-asymptotic analysis of the generalization error (in expectation, and also in high probability for least-squares), and run extensive experiments on standard machine learning benchmarks showing that they often outperform existing approaches.

Citations (394)

View on Semantic Scholar

Summary

The paper achieves an O(1/n) convergence rate by applying averaged SGD with a constant step-size for least-squares and logistic regression.
It introduces a novel stochastic gradient algorithm using local quadratic approximations to efficiently tackle non-strongly-convex loss functions.
Detailed non-asymptotic analyses and empirical benchmarks validate the approach, offering significant advancements in stochastic optimization.

An Analysis of Non-Strongly-Convex Smooth Stochastic Approximation and Convergence Rates

This paper addresses the challenge of optimizing convex functions where only unbiased estimates of gradients are available, a problem that is pivotal in machine learning contexts where empirical risk minimization is prevalent. The focus is on achieving a convergence rate of $O(1/n)$ for non-strongly-convex problems, surpassing the typical $O(1/\sqrt{n})$ rate achieved by traditional methods such as Stochastic Gradient Descent (SGD).

Contributions and Methods

The authors present two algorithms aimed at improving convergence rates for least-squares and logistic regression scenarios. Noteworthy is the use of averaged stochastic gradient descent with a constant step-size for least-squares which achieves the target convergence rate. For logistic regression, an innovative stochastic gradient algorithm is introduced. This method constructs successive local quadratic approximations of the loss function while maintaining computational efficiency comparable to standard SGD.

The novelty of this work is encapsulated in the capability of these algorithms to handle non-strongly-convex problems efficiently. The existing methods suffered from suboptimal convergence when applied to problems where the strong convexity constant is negligible or zero.

Detailed non-asymptotic analyses of generalization errors in expectation, as well as high-probability bounds for least-squares, are provided, validating the theoretical claims. Additionally, empirical tests on standard machine learning benchmarks demonstrate that these methods frequently outperform conventional approaches.

Results and Analysis

The paper reveals strong numerical outcomes, validated through experiments across varying datasets. Key results include:

The achieved rate of $O(1/n)$ convergence for both least-squares and logistic regression, without assuming strong convexity.
In comparison to conventional stochastic approximation techniques, the proposed algorithms yield improved approximation accuracies and efficiency, as evidence by empirical benchmarks.

The success of constant-step-size averaged SGD in least-squares regression underscores the importance of properly tuning algorithmic parameters relative to problem characteristics. For logistic regression, the quadratic approximation strategy offers a robust remedy to the slow convergence posed by non-linearity in loss landscapes.

Implications and Future Directions

The work extends the understanding of stochastic approximation in high-dimensional settings, suggesting significant practical implications for large-scale machine learning tasks. The approaches developed potentially open pathways to refine optimization procedures in scenarios lacking clear convexity properties.

From a theoretical standpoint, this paper enriches the landscape of optimization in machine learning, particularly underlining the role of averaging techniques and step-size constancy. Moving forward, exploration into adaptively tuning step-sizes and enhancing robustness through dynamically scaled methodologies could yield further advancements. Furthermore, applying these techniques to non-parametric contexts and investigating support points updates at different iterative scales represent promising directions for subsequent inquiry.

In conclusion, this paper makes substantial contributions to the field of stochastic optimization by addressing the difficulties posed by non-strongly-convex functions, offering both novel theoretical insights and tangible algorithmic advancements.

PDF Markdown