Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization (2401.08665v1)
Abstract: We consider the minimization of a Lipschitz continuous and expectation-valued function defined as $f(\mathbf{x}) \triangleq \mathbb{E}[{\tilde f}(\mathbf{x}, \boldsymbol{\xi})]$, over a closed and convex set. Our focus lies on obtaining both asymptotics as well as rate and complexity guarantees for computing an approximate stationary point (in a Clarke sense) via zeroth-order schemes. We adopt a smoothing-based approach reliant on minimizing $f_{\eta}$ where $f_{\eta}(\mathbf{x}) = \mathbb{E}_{\mathbf{u}}[f(\mathbf{x}+\eta \mathbf{u})]$, $\mathbf{u}$ is a random variable defined on a unit sphere, and $\eta > 0$. It has been observed that a stationary point of the $\eta$-smoothed problem is a $2\eta$-stationary point for the original problem in the Clarke sense. In such a setting, we develop two sets of schemes with promising empirical behavior. (I) We develop a smoothing-enabled variance-reduced zeroth-order gradient framework (VRG-ZO) and make two sets of contributions for the sequence generated by the proposed zeroth-order gradient scheme. (a) The residual function of the smoothed problem tends to zero almost surely along the generated sequence, allowing for making guarantees for $\eta$-Clarke stationary solutions of the original problem; (b) To compute an $\mathbf{x}$ that ensures that the expected norm of the residual of the $\eta$-smoothed problem is within $\epsilon$ requires no greater than $O(\eta{-1} \epsilon{-2})$ projection steps and $ O\left(\eta{-2} \epsilon{-4}\right)$ function evaluations. (II) Our second scheme is a zeroth-order stochastic quasi-Newton scheme (VRSQN-ZO) reliant on a combination of randomized and Moreau smoothing; the corresponding iteration and sample complexities for this scheme are $ O\left(\eta{-5}\epsilon{-2}\right)$ and $ O\left(\eta{-7}\epsilon{-4}\right)$, respectively
- K. Balasubramanian and S. Ghadimi. Zeroth-order nonconvex stochastic optimization: Handling constraints, high dimensionality, and saddle points. Foundations of Computational Mathematics, 22(1):35–76, Feb 2022.
- A. Beck. Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with Python and MATLAB. SIAM, 2023.
- R. Bollapragada and S. M. Wild. Adaptive sampling quasi-Newton methods for zeroth-order stochastic optimization. Mathematical Programming Computation, 15(2):327–364, 2023.
- Approximating subdifferentials by random sampling of gradients. Math. Oper. Res., 27(3):567–584, 2002.
- A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim., 15(3):751–779, 2005.
- A stochastic Quasi-Newton method for large-scale nonconvex optimization with applications, 2019.
- X. Chen. Smoothing methods for nonsmooth, nonconvex minimization. Math. Program., 134(1, Ser. B):71–99, 2012.
- Nonsmooth analysis and control theory, volume 178 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1998.
- Complexity guarantees for an implicit smoothing-enabled method for stochastic MPECs. Math. Program., 198(2):1153–1225, 2023.
- Nonconvex and nonsmooth approaches for affine chance-constrained stochastic programs. Set-Valued and Variational Analysis, 30(3):1149–1211, 2022.
- Y. Cui and J. S. Pang. Modern nonconvex nondifferentiable optimization. SIAM, 2021.
- F. E. Curtis and X. Que. A Quasi-Newton algorithm for nonconvex, nonsmooth optimization with global convergence guarantees. Mathematical Programming Computation, 7(4):399–428, 2015.
- D. Davis and D. Drusvyatskiy. Stochastic model-based minimization of weakly convex functions. SIAM J. Optim., 29(1):207–239, 2019.
- D. Davis and B. Grimmer. Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM J. Optim., 29(3):1908–1930, 2019.
- The minimization of semicontinuous functions: mollifier subgradients. SIAM J. Control Optim., 33(1):149–167, 1995.
- Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Programming, 155(1-2):267–305, 2016.
- A. A. Goldstein. Optimization of Lipschitz continuous functions. Math. Programming, 13(1):14–22, 1977.
- A variable sample-size stochastic quasi-Newton method for smooth and nonsmooth stochastic convex optimization. Math. Oper. Res., 47(1):690–719, 2022.
- K. C. Kiwiel. Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim., 18(2):379–388, 2007.
- J. Lei and U. V. Shanbhag. Asynchronous variance-reduced block schemes for composite non-convex stochastic optimization: block-specific steplengths and adapted batch-sizes. Optimization Methods and Software, pages 1–31, 2020.
- D. Q. Mayne and E. Polak. Nondifferential optimization via adaptive smoothing. J. Optim. Theory Appl., 43(4):601–613, 1984.
- B.S. Mordukhovich. Variational analysis and generalized differentiation. i. basic theory, ii. applications., 2009.
- Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions. Found. Comput. Math., 17(2):527–566, 2017.
- Asymptotic properties of stationary solutions of coupled nonconvex nonsmooth empirical risk minimization. Mathematics of Operations Research, 47(3):2034–2064, 2022.
- O. Shamir. Can we find near-approximately-stationary points of nonsmooth nonconvex functions? In OPT2020: 12th Annual Workshop on Optimization for Machine Learning, 2021.
- U. V. Shanbhag and F. Yousefian. Zeroth-order randomized block methods for constrained minimization of expectation-valued Lipschitz continuous functions. In 2021 Seventh Indian Control Conference (ICC), pages 7–12. IEEE, 2021.
- Lectures on stochastic programming, volume 9 of MPS/SIAM Series on Optimization. SIAM, Philadelphia, PA, 2009. Modeling and theory.
- V. A. Steklov. Sur les expressions asymptotiques decertaines fonctions définies par les équations différentielles du second ordre et leers applications au problème du dévelopement d’une fonction arbitraire en séries procédant suivant les diverses fonctions. Comm. Charkov Math. Soc., 2(10):97–199, 1907.
- Stochastic Quasi-Newton methods for nonconvex stochastic optimization. SIAM J. Optim., 27(2):927–956, 2017.
- Stochastic proximal Quasi-Newton methods for non-convex composite optimization. Optimization Methods and Software, 34:922 – 948, 2019.
- Y. Xu and W. Yin. A globally convergent algorithm for nonconvex optimization based on block coordinate update. Journal of Scientific Computing, pages 1–35, 2017.
- A stochastic extra-step Quasi-Newton method for nonsmooth nonconvex optimization. Math. Program., 194(1-2):257–303, 2022.
- On stochastic gradient and subgradient methods with adaptive steplength sequences. Automatica, 48(1):56–67, 2012.
- On smoothing, regularization, and averaging in stochastic approximation methods for stochastic variational inequality problems. Mathematical Programming, 165:391–431, 2017.
- Complexity of finding stationary points of nonconvex nonsmooth functions. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 11173–11182. PMLR, 2020.