Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity (2210.05279v2)

Published 11 Oct 2022 in cs.LG and math.OC

Abstract: $\ell_0$ constrained optimization is prevalent in machine learning, particularly for high-dimensional problems, because it is a fundamental approach to achieve sparse learning. Hard-thresholding gradient descent is a dominant technique to solve this problem. However, first-order gradients of the objective function may be either unavailable or expensive to calculate in a lot of real-world problems, where zeroth-order (ZO) gradients could be a good surrogate. Unfortunately, whether ZO gradients can work with the hard-thresholding operator is still an unsolved problem. To solve this puzzle, in this paper, we focus on the $\ell_0$ constrained black-box stochastic optimization problems, and propose a new stochastic zeroth-order gradient hard-thresholding (SZOHT) algorithm with a general ZO gradient estimator powered by a novel random support sampling. We provide the convergence analysis of SZOHT under standard assumptions. Importantly, we reveal a conflict between the deviation of ZO estimators and the expansivity of the hard-thresholding operator, and provide a theoretical minimal value of the number of random directions in ZO gradients. In addition, we find that the query complexity of SZOHT is independent or weakly dependent on the dimensionality under different settings. Finally, we illustrate the utility of our method on a portfolio optimization problem as well as black-box adversarial attacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Mathematical methods for physicists. American Association of Physics Teachers, 1999.
  2. Zeroth-order (non)-convex stochastic optimization via conditional gradient and gradient updates. In Advances in Neural Information Processing Systems, volume 31, 2018.
  3. John E Beasley. Or-library: distributing test problems by electronic mail. Journal of the operational research society, 41(11):1069–1072, 1990.
  4. Amir Beck. First-order methods in optimization. SIAM, 2017.
  5. Peter Bühlmann and Sara Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.
  6. A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization. In International Conference on Machine Learning, pages 1193–1203. PMLR, 2021.
  7. Zeroth-order regularized optimization (zoro): Approximately sparse gradients and adaptive sampling. SIAM Journal on Optimization, 32(2):687–714, 2022.
  8. Heuristics for cardinality constrained portfolio optimisation. Computers & Operations Research, 27(13):1271–1302, 2000.
  9. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pages 15–26, 2017.
  10. Zo-adamm: Zeroth-order adaptive momentum method for black-box optimization. In Advances in Neural Information Processing Systems, volume 32, 2019.
  11. Provably robust blackbox optimization for reinforcement learning. In Conference on Robot Learning, pages 683–696. PMLR, 2020.
  12. On the information-adaptive variants of the admm: an iteration complexity perspective. Journal of Scientific Computing, 76(1):327–363, 2018.
  13. Faster rates for the frank-wolfe method over strongly-convex sets. In International Conference on Machine Learning, pages 541–549. PMLR, 2015.
  14. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming, 155(1):267–305, 2016.
  15. Gradientless descent: High-dimensional zeroth-order optimization. In International Conference on Learning Representations, 2019.
  16. SGD: General analysis and improved rates. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 5200–5209. PMLR, 2019.
  17. On iterative hard thresholding methods for high-dimensional m-estimation. In Advances in Neural Information Processing Systems, volume 27, 2014.
  18. Query complexity of derivative-free optimization. In Advances in Neural Information Processing Systems, volume 25, 2012.
  19. Constrained minimization methods. USSR Computational mathematics and mathematical physics, 6(5):1–50, 1966.
  20. Nonconvex sparse learning via stochastic optimization with progressive variance reduction. arXiv preprint arXiv:1605.02711, 2016.
  21. A comprehensive linear speedup analysis for asynchronous stochastic parallel optimization from zeroth-order to first-order. Advances in Neural Information Processing Systems, 29, 2016.
  22. Hongcheng Liu and Yu Yang. A dimension-insensitive algorithm for stochastic zeroth-order optimization. arXiv preprint arXiv:2104.11283, 2021.
  23. Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications. In International Conference on Artificial Intelligence and Statistics, pages 288–297. PMLR, 2018a.
  24. Zeroth-order stochastic variance reduction for nonconvex optimization. In Advances in Neural Information Processing Systems, volume 31, 2018b.
  25. A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications. IEEE Signal Processing Magazine, 37(5):43–54, 2020.
  26. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. Advances in Neural Information Processing Systems, 26, 2013.
  27. Simple random search of static linear policies is competitive for reinforcement learning. In Advances in Neural Information Processing Systems, volume 31, 2018.
  28. A unified framework for high-dimensional analysis of m𝑚mitalic_m-estimators with decomposable regularizers. Advances in neural information processing systems, 22, 2009.
  29. A unified framework for high-dimensional analysis of m𝑚mitalic_m-estimators with decomposable regularizers. Statistical science, 27(4):538–557, 2012.
  30. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017.
  31. Linear convergence of stochastic iterative greedy algorithms with sparse constraints. IEEE Transactions on Information Theory, 63(11):6869–6895, 2017.
  32. Ac/dc: Alternating compressed/decompressed training of deep neural networks. Advances in Neural Information Processing Systems, 34, 2021.
  33. Minimax rates of estimation for high-dimensional linear regression over ℓqsubscriptℓ𝑞\ell_{q}roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-balls. IEEE transactions on information theory, 57(10):6976–6994, 2011.
  34. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, 2017.
  35. Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. The Journal of Machine Learning Research, 18(1):1703–1713, 2017.
  36. A tight bound of hard thresholding. The Journal of Machine Learning Research, 18(1):7650–7691, 2017.
  37. Sparse stochastic zeroth-order optimization with an application to bandit structured prediction. arXiv preprint arXiv:1806.04458, 2018.
  38. Stanislav Sykora. Surface integrals over n-dimensional spheres. Stan’s Library, (Volume I), May 2005. doi: 10.3247/sl1math05.002.
  39. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
  40. Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 742–749, 2019.
  41. Sara A Van de Geer. High-dimensional generalized linear models and the lasso. The Annals of Statistics, 36(2):614–645, 2008.
  42. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
  43. Christian Walck et al. Hand-book on statistical distributions for experimentalists. University of Stockholm, 10:96–01, 2007.
  44. Stochastic zeroth-order optimization in high dimensions. In International Conference on Artificial Intelligence and Statistics, pages 1356–1365. PMLR, 2018.
  45. Gradient hard thresholding pursuit. Journal of Machine Learning Research, 18(1):6027–6069, 2017.
  46. Stability and risk bounds of iterative hard thresholding. In International Conference on Artificial Intelligence and Statistics, pages 1702–1710. PMLR, 2021.
  47. Efficient stochastic gradient hard thresholding. In Advances in Neural Information Processing Systems, volume 31, 2018.
Citations (2)

Summary

We haven't generated a summary for this paper yet.