Papers
Topics
Authors
Recent
Search
2000 character limit reached

Memory-Constrained Algorithms for Convex Optimization via Recursive Cutting-Planes

Published 16 Jun 2023 in math.OC, cs.CC, cs.DS, cs.LG, and stat.ML | (2306.10096v1)

Abstract: We propose a family of recursive cutting-plane algorithms to solve feasibility problems with constrained memory, which can also be used for first-order convex optimization. Precisely, in order to find a point within a ball of radius $\epsilon$ with a separation oracle in dimension $d$ -- or to minimize $1$-Lipschitz convex functions to accuracy $\epsilon$ over the unit ball -- our algorithms use $\mathcal O(\frac{d2}{p}\ln \frac{1}{\epsilon})$ bits of memory, and make $\mathcal O((C\frac{d}{p}\ln \frac{1}{\epsilon})p)$ oracle calls, for some universal constant $C \geq 1$. The family is parametrized by $p\in[d]$ and provides an oracle-complexity/memory trade-off in the sub-polynomial regime $\ln\frac{1}{\epsilon}\gg\ln d$. While several works gave lower-bound trade-offs (impossibility results) -- we explicit here their dependence with $\ln\frac{1}{\epsilon}$, showing that these also hold in any sub-polynomial regime -- to the best of our knowledge this is the first class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $\epsilon\leq 1/\sqrt d$. The algorithms divide the $d$ variables into $p$ blocks and optimize over blocks sequentially, with approximate separation vectors constructed using a variant of Vaidya's method. In the regime $\epsilon \leq d{-\Omega(d)}$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Kurt M Anstreicher “On Vaidya’s volumetric cutting plane method for convex programming” In Mathematics of Operations Research 22.1 INFORMS, 1997, pp. 63–89
  2. Kurt M Anstreicher “Towards a practical volumetric cutting plane method for convex programming” In SIAM Journal on Optimization 9.1 SIAM, 1998, pp. 190–206
  3. “Non-euclidean restricted memory level method for large-scale convex optimization” In Mathematical Programming 102.3, 2005, pp. 407–456 DOI: 10.1007/s10107-004-0553-4
  4. “Solving convex programs by random walks” In Journal of the ACM (JACM) 51.4 ACM New York, NY, USA, 2004, pp. 540–556
  5. Moise Blanchard, Junhui Zhang and Patrick Jaillet “Quadratic Memory is Necessary for Optimal Query Complexity in Convex Optimization: Center-of-Mass is Pareto-Optimal” In arXiv preprint arXiv:2302.04963, 2023
  6. “Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality” In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’16 Cambridge, MA, USA: Association for Computing Machinery, 2016, pp. 1011–1020 DOI: 10.1145/2897518.2897582
  7. C.G. Broyden “The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations” In IMA Journal of Applied Mathematics 6.1, 1970, pp. 76–90 DOI: 10.1093/imamat/6.1.76
  8. Sébastien Bubeck “Convex Optimization: Algorithms and Complexity” In Foundations and Trends® in Machine Learning 8.3-4, 2015, pp. 231–357 DOI: 10.1561/2200000050
  9. Sébastien Bubeck, Yin Tat Lee and Mohit Singh “A geometric alternative to Nesterov’s accelerated gradient descent”, 2015 arXiv:1506.08187 [math.OC]
  10. Alexandre d’Aspremont, Damien Scieur and Adrien Taylor “Acceleration Methods” In Foundations and Trends® in Optimization 5.1-2, 2021, pp. 1–245 DOI: 10.1561/2400000036
  11. Yuval Dagan, Gil Kur and Ohad Shamir “Space lower bounds for linear prediction in the streaming model” In Proceedings of the Thirty-Second Conference on Learning Theory 99, Proceedings of Machine Learning Research PMLR, 2019, pp. 929–954 URL: https://proceedings.mlr.press/v99/dagan19b.html
  12. “Detecting Correlations with Little Memory and Communication” In Proceedings of the 31st Conference On Learning Theory 75, Proceedings of Machine Learning Research PMLR, 2018, pp. 1145–1198 URL: https://proceedings.mlr.press/v75/dagan18a.html
  13. James Demmel, Ioana Dumitriu and Olga Holtz “Fast linear algebra is stable” In Numerische Mathematik 108.1 Springer, 2007, pp. 59–91
  14. “Optimized first-order methods for smooth convex minimization” In Mathematical Programming 159, 2014 DOI: 10.1007/s10107-015-0949-3
  15. “Performance of first-order methods for smooth convex minimization: a novel approach” In Mathematical Programming 145.1, 2014, pp. 451–482 DOI: 10.1007/s10107-013-0653-0
  16. Dmitriy Drusvyatskiy, Maryam Fazel and Scott Roy “An Optimal First Order Method Based on Optimal Quadratic Averaging” In SIAM Journal on Optimization 28.1, 2018, pp. 251–271 DOI: 10.1137/16M1072528
  17. “On the optimality of the random hyperplane rounding technique for MAX CUT” In Random Structures & Algorithms 20.3 Wiley Online Library, 2002, pp. 403–440
  18. R. Fletcher “A new approach to variable metric algorithms” In The Computer Journal 13.3, 1970, pp. 317–322 DOI: 10.1093/comjnl/13.3.317
  19. Donald Goldfarb “A Family of Variable-Metric Methods Derived by Variational Means” In Mathematics of Computation 24.109 American Mathematical Society, 1970, pp. 23–26 URL: http://www.jstor.org/stable/2004873
  20. “An Improved Cutting Plane Method for Convex Optimization, Convex-Concave Games, and Its Applications” In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020 Chicago, IL, USA: Association for Computing Machinery, 2020, pp. 944–953 DOI: 10.1145/3357713.3384284
  21. “Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales” In Proceedings of Thirty Fifth Conference on Learning Theory 178, Proceedings of Machine Learning Research PMLR, 2022, pp. 2431–2540 URL: https://proceedings.mlr.press/v178/kelner22a.html
  22. Guanghui Lan “Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization” In Mathematical Programming 149.1, 2015, pp. 1–45 DOI: 10.1007/s10107-013-0737-x
  23. Guanghui Lan, Soomin Lee and Yi Zhou “Communication-efficient algorithms for decentralized and stochastic optimization” In Mathematical Programming 180.1, 2020, pp. 237–284 DOI: 10.1007/s10107-018-1355-4
  24. Quoc Le, Alex Smola and S.V.N. Vishwanathan “Bundle Methods for Machine Learning” In Advances in Neural Information Processing Systems 20 Curran Associates, Inc., 2007 URL: https://proceedings.neurips.cc/paper_files/paper/2007/file/26337353b7962f533d78c762373b3318-Paper.pdf
  25. Yin Tat Lee, Aaron Sidford and Sam Chiu-wai Wong “A faster cutting plane method and its implications for combinatorial and convex optimization” In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, 2015, pp. 1049–1065 IEEE
  26. Claude Lemaréchal, Arkadi Nemirovski and Yurii Nesterov “New variants of bundle methods” In Mathematical Programming 69.1, 1995, pp. 111–147 DOI: 10.1007/BF01585555
  27. Adrian S. Lewis and Michael L. Overton “Nonsmooth optimization via quasi-Newton methods” In Mathematical Programming 141.1, 2013, pp. 135–163
  28. Dong C. Liu and Jorge Nocedal “On the limited memory BFGS method for large scale optimization” In Mathematical Programming 45.1, 1989, pp. 503–528
  29. “Efficient convex optimization requires superlinear memory” In Conference on Learning Theory, 2022, pp. 2390–2430 PMLR
  30. “Mixing Implies Lower Bounds for Space Bounded Learning” In Proceedings of the 2017 Conference on Learning Theory PMLR, 2017, pp. 1516–1566
  31. “D-ADMM: A Communication-Efficient Distributed Algorithm for Separable Optimization” In IEEE Transactions on Signal Processing 61.10, 2013, pp. 2718–2723 DOI: 10.1109/TSP.2013.2254478
  32. Arkadi Nemirovski and David Borisovich Yudin “Problem Complexity and Method Efficiency in Optimization”, A Wiley-Interscience publication Wiley, 1983
  33. Yurii Nesterov “A method of solving a convex programming problem with convergence rate O⁢(1/k2)𝑂1superscript𝑘2O(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )” In Dokl. Akad. Nauk SSSR 269, 1983, pp. 543–547
  34. Yurii Nesterov “Introductory lectures on convex optimization: A basic course” Springer Science & Business Media, 2003
  35. Jorge Nocedal “Updating Quasi-Newton Matrices with Limited Storage” In Mathematics of Computation 35.151 American Mathematical Society, 1980, pp. 773–782
  36. Srinivasan Ramaswamy and John E Mitchell “A long step cutting plane algorithm that uses the volumetric barrier”, 1995
  37. Ran Raz “A Time-Space Lower Bound for a Large Class of Learning Problems” In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), 2017, pp. 732–742 DOI: 10.1109/FOCS.2017.73
  38. “AIDE: Fast and Communication Efficient Distributed Optimization” In ArXiv abs/1608.06879, 2016
  39. B.Van Scoy, R.A. Freeman and K.M. Lynch “The Fastest Known Globally Convergent First-Order Method for Minimizing Strongly Convex Functions” In IEEE Control Systems Letters PP.99, 2017, pp. 1–1 DOI: 10.1109/LCSYS.2017.2722406
  40. Ohad Shamir, Nati Srebro and Tong Zhang “Communication-Efficient Distributed Optimization using an Approximate Newton-type Method” In Proceedings of the 31st International Conference on Machine Learning 32.2, Proceedings of Machine Learning Research Bejing, China: PMLR, 2014, pp. 1000–1008 URL: https://proceedings.mlr.press/v32/shamir14.html
  41. D.F. Shanno “Conditioning of Quasi-Newton Methods for Function Minimization” In Mathematics of Computation 24.111 American Mathematical Society, 1970, pp. 647–656 URL: http://www.jstor.org/stable/2004840
  42. Vatsal Sharan, Aaron Sidford and Gregory Valiant “Memory-Sample Tradeoffs for Linear Regression with Small Error” In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019 Association for Computing Machinery, 2019, pp. 890–901
  43. “CoCoA: A General Framework for Communication-Efficient Distributed Optimization” In J. Mach. Learn. Res. 18.1 JMLR.org, 2017, pp. 8590–8638
  44. “An optimal gradient method for smooth strongly convex minimization” In Mathematical Programming 199.1, 2023, pp. 557–594 DOI: 10.1007/s10107-022-01839-y
  45. “Bundle Methods for Regularized Risk Minimization” In Journal of Machine Learning Research 11.10, 2010, pp. 311–365 URL: http://jmlr.org/papers/v11/teo10a.html
  46. Pravin M Vaidya “A new algorithm for minimizing convex functions over convex sets” In Mathematical programming 73.3 Springer, 1996, pp. 291–341
  47. Jialei Wang, Weiran Wang and Nathan Srebro “Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox” In Proceedings of the 2017 Conference on Learning Theory 65, Proceedings of Machine Learning Research PMLR, 2017, pp. 1882–1919 URL: https://proceedings.mlr.press/v65/wang17a.html
  48. “Gradient Sparsification for Communication-Efficient Distributed Optimization” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18 Montréal, Canada: Curran Associates Inc., 2018, pp. 1306–1316
  49. “Open problem: The oracle complexity of convex optimization with limited memory” In Conference on Learning Theory, 2019, pp. 3202–3210 PMLR
  50. Yuchen Zhang, John C. Duchi and Martin J. Wainwright “Communication-efficient algorithms for statistical optimization” In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 2012, pp. 6792–6792 DOI: 10.1109/CDC.2012.6426691
Citations (3)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.