The Price of Adaptivity in Stochastic Convex Optimization (2402.10898v3)
Abstract: We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a "price of adaptivity" (PoA) that, roughly speaking, measures the multiplicative increase in suboptimality due to uncertainty in these parameters. When the initial distance to the optimum is unknown but a gradient norm bound is known, we show that the PoA is at least logarithmic for expected suboptimality, and double-logarithmic for median suboptimality. When there is uncertainty in both distance and gradient norm, we show that the PoA must be polynomial in the level of uncertainty. Our lower bounds nearly match existing upper bounds, and establish that there is no parameter-free lunch. En route, we also establish tight upper and lower bounds for (known-parameter) high-probability stochastic convex optimization with heavy-tailed and bounded noise, respectively.
- Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Transactions on Information Theory, 58(5):3235–3249, 2012.
- A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
- A. A. Borovkov. Probability theory. CRC Press, 1999.
- Lower bounds on the oracle complexity of nonsmooth convex optimization via information theory. IEEE Transactions on Information Theory, 63(7):4709–4724, 2017.
- Y. Carmon and O. Hinder. Making SGD parameter-free. In Conference on Learning Theory (COLT), 2022.
- Optimal and adaptive monteiro-svaiter acceleration. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Better parameter-free stochastic optimization with ODE updates for coin-betting. In AAAI Conference on Artificial Intelligence, 2022.
- Elements of information theory. John Wiley & Sons, 1991.
- A. Cutkosky. Artificial constraints and hints for unbounded online learning. In Conference on Learning Theory (COLT), 2019.
- A. Cutkosky and K. Boahen. Online learning without prior information. In Conference on Learning Theory (COLT), 2017.
- A. Cutkosky and F. Orabona. Black-box reductions for parameter-free online learning in Banach spaces. In Conference on Learning Theory (COLT), 2018.
- From low probability to high confidence in stochastic convex optimization. The Journal of Machine Learning Research, 22(1):2237–2274, 2021.
- J. C. Duchi. Introductory lectures on stochastic optimization. The Mathematics of Data, 25:99–186, 2018.
- R. M. Fano and W. Wintringham. Transmission of information, 1961.
- D. A. Freedman. On tail probabilities for martingales. the Annals of Probability, pages 100–118, 1975.
- Stochastic optimization with heavy-tailed noise via accelerated gradient clipping. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Near-optimal high probability complexity bounds for non-smooth stochastic optimization with heavy-tailed noise. arXiv:2106.05958, 2021.
- A unified approach to adaptive regularization in online and stochastic optimization. arXiv:1706.06569, 2017.
- E. Hazan and S. Kakade. Revisiting the Polyak step size. arXiv:1905.00313, 2019.
- DoG is SGD’s best friend: A parameter-free dynamic step size schedule. In International Conference on Machine Learning (ICML), 2023.
- A. Jacobsen and A. Cutkosky. Unconstrained online learning with unbounded losses. In International Conference on Machine Learning (ICML), 2023.
- R. M. Karp and R. Kleinberg. Noisy binary search and its applications. In Symposium on Discrete Algorithms (SODA), 2007.
- UniXGrad: A universal, adaptive algorithm with optimal guarantees for constrained optimization. Advances in Neural Information Processing Systems (NeurIPS), 2019.
- H. B. McMahan. A survey of algorithms and analysis for adaptive online learning. The Journal of Machine Learning Research, 18(1):3117–3166, 2017.
- H. B. McMahan and F. Orabona. Unconstrained online linear learning in Hilbert spaces: Minimax algorithms and normal approximations. In Conference on Learning Theory (COLT), 2014.
- Z. Mhammedi and W. M. Koolen. Lipschitz and comparator-norm adaptivity in online learning. In Conference on Learning Theory (COLT), 2020.
- Lipschitz adaptivity with multiple learning rates in online learning. In Conference on Learning Theory, pages 2490–2511, 2019.
- K. Mishchenko and A. Defazio. Prodigy: An expeditiously adaptive parameter-free learner. arXiv:2306.06101, 2023.
- Algorithms of robust stochastic optimization based on mirror descent method. Automation and Remote Control, 80:1607–1627, 2019.
- A. Nemirovski and D. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience, 1983.
- Improved convergence in high probability of clipped gradient methods with heavy tailed noise. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- F. Orabona. Dimension-free exponentiated gradient. Advances in Neural Information Processing Systems (NeurIPS), 2013.
- F. Orabona. Simultaneous model selection and optimization through parameter-free stochastic learning. Advances in Neural Information Processing Systems (NeurIPS), 2014.
- F. Orabona. A modern introduction to online learning. arXiv:1912.13213, 2021.
- F. Orabona and D. Pál. Coin betting and parameter-free online learning. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
- F. Orabona and D. Pál. Parameter-free stochastic optimization of variationally coherent functions. arXiv:2102.00236, 2021.
- C. Paquette and K. Scheinberg. A stochastic line search method with expected complexity analysis. SIAM Journal on Optimization, 30(1):349–376, 2020.
- B. T. Polyak. Introduction to Optimization. Optimization Software, Inc, 1987.
- T. Roughgarden. Selfish routing and the price of anarchy. MIT press, 2005.
- High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance. In International Conference on Machine Learning (ICML), 2023.
- L. Sharrock and C. Nemeth. Coin sampling: Gradient-based bayesian inference without learning rates. In International Conference on Machine Learning (ICML), 2023.
- M. Streeter and H. B. McMahan. No-regret algorithms for unconstrained online convex optimization. In Advances in Neural Information Processing Systems (NeurIPS), 2012.
- Painless stochastic gradient: Interpolation, line-search, and convergence rates. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
- J. Zhang and A. Cutkosky. Parameter-free regret in high probability with heavy tails. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- PDE-based optimal strategy for unconstrained online learning. In International Conference on Machine Learning (ICML), 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.