Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization (1308.6594v2)

Published 29 Aug 2013 in math.OC

Abstract: This paper considers a class of constrained stochastic composite optimization problems whose objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a certain non-differentiable (but convex) component. In order to solve these problems, we propose a randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samples are taken at each iteration depending on the total budget of stochastic samples allowed. The RSPG algorithm also employs a general distance function to allow taking advantage of the geometry of the feasible region. Complexity of this algorithm is established in a unified setting, which shows nearly optimal complexity of the algorithm for convex stochastic programming. A post-optimization phase is also proposed to significantly reduce the variance of the solutions returned by the algorithm. In addition, based on the RSPG algorithm, a stochastic gradient free algorithm, which only uses the stochastic zeroth-order information, has been also discussed. Some preliminary numerical results are also provided.

Citations (469)

Summary

  • The paper proposes a randomized stochastic projected gradient (RSPG) algorithm that enhances convergence in nonconvex composite optimization by smartly selecting mini-batches.
  • It establishes nearly optimal complexity for convex cases while demonstrating effective convergence in nonconvex settings, as validated by experiments on SVMs and penalized least squares.
  • The study also introduces a post-optimization phase to minimize solution variance and extends the method to gradient-free settings using stochastic zeroth-order information.

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization

The research by Ghadimi, Lan, and Zhang introduces innovative stochastic approximation methods aimed at solving nonconvex stochastic composite optimization problems, where objectives are structured with a summation of a nonconvex differentiable component and a nondifferentiable convex component. This approach is pivotal in addressing scenarios commonly encountered in machine learning and statistical areas, particularly when dealing with structured sparsity or regularization in models.

The authors propose a randomized stochastic projected gradient (RSPG) algorithm, which emphasizes the importance of selecting a suitable mini-batch of samples at each iteration. This choice is critical given the constraints on the total allowed budget for stochastic samples. The RSPG algorithm distinguishes itself by incorporating a general distance function, which intelligently exploits the geometry of the feasible region, leading to nearly optimal complexity for convex stochastic programming.

The paper establishes the complexity of the RSPG algorithm within a cohesive framework. Notably, it demonstrates that even for nonconvex functions, the algorithm exhibits effective convergence properties. This is significant as it extends the applicability of stochastic approximation methods beyond traditional convex settings.

Interestingly, the paper introduces a post-optimization phase designed to minimize the variance of solutions returned by the algorithm. This is achieved by executing multiple runs of the algorithm independently, followed by a strategic selection of candidate solutions based on minimized variance. The convergence guarantees are extended to this two-phase approach, showcasing robust performance improvements in solution variance, particularly under light-tail noise conditions as per the stochastic zeroth-order information model.

The paper contributes theoretically by integrating stochastic zeroth-order methods, offering a gradient-free variant of the RSPG algorithm. This adaption is particularly remarkable for cases where only noisy function value information is accessible, such as in derivative-free optimization or when gradients are impractical to compute directly.

Numerical experiments conducted by the authors underscore the efficiency of the RSPG and its variants when applied to nonconvex problems such as semi-supervised support vector machines and penalized least square problems with nonconvex regularization. The RSPG algorithm consistently demonstrates fewer iterations to convergence and achieves solutions with lower objective function values compared to existing methods.

The implications of this work are profound in both practical and theoretical domains. Practically, it heralds improved algorithms for high-dimensional data analysis in machine learning, enabling efficient processing of complex, constrained optimization tasks. Theoretically, it advances the understanding of stochastic approximation in nonconvex landscapes, paving the way for future algorithmic enhancements and potential applications in evolving fields such as nonconvex games or deep learning. Future research could extend this work by exploring adaptive mini-batch sizes or refining the post-optimization phase further to handle even broader classes of stochastic problems in artificial intelligence and data-driven industries.