A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization (2208.00290v4)
Abstract: In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the delta sphere. We analyze the bias and variance of the proposed gradient estimator. Our algorithm is found to be particularly useful in the case when the objective function is non-convex, and the parameter dimension is high. From an asymptotic convergence analysis, we establish that our algorithm converges almost surely to the set of stationary points of the objective function and obtains the asymptotic convergence rate. We also show that our algorithm avoids unstable equilibria, implying convergence to local minima. Further, we perform a non-asymptotic convergence analysis of our algorithm. In particular, we establish here a non-asymptotic bound for finding an epsilon-stationary point of the non-convex objective function. Finally, we demonstrate numerically through simulations that the performance of our algorithm outperforms GSF, SPSA, and RDSA by a significant margin over a few non-convex settings and further validate its performance over convex (noisy) objectives.
- M. Baes. Estimate sequence methods: Extensions and approximations. IFOR Internal report, ETH Zurich, Switzerland, Tech, 2009.
- K. Balasubramanian and S. Ghadimi. Zeroth-order nonconvex stochastic optimization: Handling constraints, high dimensionality, and saddle points. Springer, 2018.
- D. Bertsekas. Reinforcement learning and optimal control. Athena Scientific, 2019.
- S. Bhatnagar. Adaptive Newton-based smoothed functional algorithms for simulation optimization. ACM Transactions on Modeling and Computer Simulation, 18(1):2:1–2:35, 2007.
- S. Bhatnagar and V. S. Borkar. Multiscale chaotic spsa and smoothed functional algorithms for simulation optimization. Simulation 79, page 568–580, 2003.
- Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods (Lecture Notes in Control and Information Sciences). Springer, 2013.
- N. Bhavsar and L. A. Prashanth. Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles. IEEE Transactions on Automatic Control, 2022.
- D. C. Chin. Comparative study of stochastic algorithms for system optimization based on gradient approximations. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 27(2):244–249, 1997.
- First- order methods of smooth convex optimization with inexact oracle. Mathematical Programming, page 37–75, 2014.
- P. J. Gawthrop and D. Sbarbaro. Stochastic approximation and multilayer perceptrons: The gain backpropagation algorithm. Complex Systems, 4(1):51–74, 1990.
- Escaping from saddle points – online stochastic gradient for tensor decomposition. JMLR: Workshop and Conference Proceedings, 2015.
- L. Gerencser. Convergence rate of moments in stochastic approximation with simultaneous perturbation gradient approximation and resetting. IEEE Trans. Autom. Control, page 894–905, 1999.
- S. Ghadimi and G. Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, pages 15–20, 2013.
- (bandit) convex optimization with biased noisy gradient oracles. Artificial Intel- ligence and Statistics, page 819–828, 2016.
- How to escape saddle points efficiently. ICML, 2017.
- V. Y. Katkovnik and Y. Kulchitsky. Convergence of a class of random search algorithms. Automation Remote Control, 8:1321–1326, 1972.
- J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, page 462–466, 1952.
- J. Kreimer and R. Y. Rubinstein. Smoothed functionals and constrained stochastic approximation. JSTOR,, 1972.
- H. Kushner and D. Clark. Stochastic approximation methods for constrained and unconstrained systems. Springer Verlag, 1978a.
- Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer Verlag, New York, 1978b.
- Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics volume, page 527–566, 2017.
- R. Pemantle. Non-convergence to unstable points in urn models and stochastic approximations. The annals of Probability, page 3, 1990.
- Adaptive system optimization using random directions stochastic approximation. IEEE Transactions on Automatic Control, 62(5):2223–2238, 2017.
- H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, pages 400–407, 1951.
- R. Y. Rubinstein. Simulation and the Monte Carlo Method. Wiley, New York, 1981.
- J. C. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE TRANSACTIONS ON AUTOMATIC CONTROL,, 1992.
- P. G. Staneski. The truncated cauchy distribution: Estimation of parameters and application to stock returns. Dissertation. Old Dominion University, 1990.
- M. Styblinski and T. Tang. Experiments in non-convex optimization: Stochastic approximation with function smoothing and simulated annealing. Neural Networks, 1990.
- Lms-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLSf. In Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pages 1181–1188. IEEE, 2009.
- J. Zhu. Hessian estimation via stein’s identity in black-box problems. Proceedings of Machine Learning Research 2021, pages 1–17, 2021.