Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimal Algorithms for Stochastic Complementary Composite Minimization (2211.01758v2)

Published 3 Nov 2022 in cs.LG and math.OC

Abstract: Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization in the stochastic setting. This problem corresponds to the minimization of the sum of a (weakly) smooth function endowed with a stochastic first-order oracle, and a structured uniformly convex (possibly nonsmooth and non-Lipschitz) regularization term. Despite intensive work on closely related settings, prior to our work no complexity bounds for this problem were known. We close this gap by providing novel excess risk bounds, both in expectation and with high probability. Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems. We conclude by providing numerical results comparing our methods to the state of the art.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Iterative refinement for ℓpsubscriptℓ𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm regression. In Proc. ACM-SIAM SODA’19, 2019.
  2. Fast, provably convergent IRLS algorithm for p𝑝pitalic_p-norm linear regression. In Proc. NeurIPS’19, 2019.
  3. Private stochastic convex optimization: Optimal rates in l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT geometry. CoRR, abs/2103.01516, 2021.
  4. Sharp uniform convexity and smoothness inequalities for trace norms. Inventiones mathematicae, 115(1):463–482, 1994.
  5. Amir Beck. First-Order Methods in Optimization. SIAM-Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2017.
  6. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
  7. Uniformly convex functions on Banach spaces. Proceedings of the AMS, 137(3):1081–1091, 2009.
  8. An homotopy method for ℓpsubscriptℓ𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT regression provably beyond self-concordance and in input-sparsity time. In Proc. ACM STOC’18, 2018.
  9. Metric Characterization of Random Variables and Random Processes. Cross Cultural Communication. American Mathematical Society, 2000.
  10. Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM Journal on Optimization, 3(3):538–543, 1993.
  11. Relative lipschitzness in extragradient methods and a direct recipe for acceleration, 2020.
  12. Regularized learning schemes in feature banach spaces. Analysis and Applications, 16(01):1–54, 2018.
  13. Optimal affine-invariant smooth minimization algorithms. SIAM Journal on Optimization, 28(3):2384–2405, 2018.
  14. A stochastic smoothing algorithm for semidefinite programming, 2014.
  15. First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming, 146, 08 2013.
  16. Complementary composite minimization, small gradients in general norms, and applications to regression problems. 2021.
  17. Fast stochastic composite minimization and an accelerated frank-wolfe algorithm under parallelization. arXiv:2205.12751, 2022.
  18. A statistical view of some chemometrics regression tools. Technometrics, 35(2):109–135, 1993.
  19. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization i: A generic algorithmic framework. SIAM Journal on Optimization, 22(4):1469–1492, 2012.
  20. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, ii: Shrinking procedures and optimal algorithms. SIAM Journal on Optimization, 23(4):2061–2089, 2013.
  21. Train faster, generalize better: Stability of stochastic gradient descent. CoRR, abs/1509.01240, 2015.
  22. Accelerated gradient methods for stochastic optimization and online learning. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc., 2009.
  23. Large deviations of vector-valued martingales in 2-smooth normed spaces. arXiv, 2008:0809.0813.
  24. Deterministic and stochastic primal-dual subgradient algorithms for uniformly convex minimization. Stochastic Systems, 4(1):44 – 80, 2014.
  25. Vladimir Koltchinskii. Sparsity in penalized empirical risk minimization. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 45(1):7 – 57, 2009.
  26. Guanghui Lan. An optimal method for stochastic composite optimization. Mathematical Programming, 133:1–33, 06 2012.
  27. Guanghui Lan. First-order and Stochastic Optimization Methods for Machine Learning. Springer Series in the Data Sciences. Springer International Publishing, 2020.
  28. Foundations of Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2 edition, 2018.
  29. Arkadi Nemirovski. Efficient methods in convex programming, 1995.
  30. Robust stochastic approximation approach to stochastic programming. Society for Industrial and Applied Mathematics, 19:1574–1609, 01 2009.
  31. Problem Complexity and Method Efficiency in Optimization. A Wiley-Interscience publication. Wiley, 1983.
  32. Yurii Nesterov. A method for solving the convex programming problem with convergence rate o⁢(1/k2)𝑜1superscript𝑘2o(1/k^{2})italic_o ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Proceedings of the USSR Academy of Sciences, 269:543–547, 1983.
  33. Yurii Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1):125–161, 2013.
  34. Yurii Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer Publishing Company, Incorporated, 1 edition, 2014.
  35. Yurii Nesterov. Universal gradient methods for convex optimization problems. Mathematical Programming, 152, 05 2014.
  36. On first-order algorithms for ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT/nuclear norm minimization. Acta Numerica, 22:509–575, 2013.
  37. Francesco Orabona. A modern introduction to online learning. CoRR, abs/1912.13213, 2019.
  38. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3):471–501, 2010.
  39. Understanding Machine Learning - From Theory to Algorithms. Cambridge University Press, 2014.
  40. Optimization for Machine Learning. The MIT Press, 2011.
  41. Andrey Nikolayevich Tikhonov. On the stability of inverse problems. In Dokl. Akad. Nauk SSSR, volume 39, pages 195–198, 1943.
  42. On uniformly convex functionals. Vestnik Moskov. Univ. Ser. XV Vychisl. Mat. Kibernet, 3:12–23, 1978.
  43. Mirror descent strikes again: Optimal stochastic convex optimization under infinite noise variance. In Po-Ling Loh and Maxim Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 65–102. PMLR, 02–05 Jul 2022.
  44. M.J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019.
  45. Krzysztof Zajkowski. On norms in some class of exponential type orlicz spaces of random variables. Positivity, 24(5):1231–1240, 2020.
  46. C. Zalinescu. On uniformly convex functions. J. Math. Anal. Appl., 95:344–374, 1983.
Citations (3)

Summary

We haven't generated a summary for this paper yet.