Memory-Constrained Algorithms for Convex Optimization via Recursive Cutting-Planes
Abstract: We propose a family of recursive cutting-plane algorithms to solve feasibility problems with constrained memory, which can also be used for first-order convex optimization. Precisely, in order to find a point within a ball of radius $\epsilon$ with a separation oracle in dimension $d$ -- or to minimize $1$-Lipschitz convex functions to accuracy $\epsilon$ over the unit ball -- our algorithms use $\mathcal O(\frac{d2}{p}\ln \frac{1}{\epsilon})$ bits of memory, and make $\mathcal O((C\frac{d}{p}\ln \frac{1}{\epsilon})p)$ oracle calls, for some universal constant $C \geq 1$. The family is parametrized by $p\in[d]$ and provides an oracle-complexity/memory trade-off in the sub-polynomial regime $\ln\frac{1}{\epsilon}\gg\ln d$. While several works gave lower-bound trade-offs (impossibility results) -- we explicit here their dependence with $\ln\frac{1}{\epsilon}$, showing that these also hold in any sub-polynomial regime -- to the best of our knowledge this is the first class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $\epsilon\leq 1/\sqrt d$. The algorithms divide the $d$ variables into $p$ blocks and optimize over blocks sequentially, with approximate separation vectors constructed using a variant of Vaidya's method. In the regime $\epsilon \leq d{-\Omega(d)}$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
- Kurt M Anstreicher “On Vaidya’s volumetric cutting plane method for convex programming” In Mathematics of Operations Research 22.1 INFORMS, 1997, pp. 63–89
- Kurt M Anstreicher “Towards a practical volumetric cutting plane method for convex programming” In SIAM Journal on Optimization 9.1 SIAM, 1998, pp. 190–206
- “Non-euclidean restricted memory level method for large-scale convex optimization” In Mathematical Programming 102.3, 2005, pp. 407–456 DOI: 10.1007/s10107-004-0553-4
- “Solving convex programs by random walks” In Journal of the ACM (JACM) 51.4 ACM New York, NY, USA, 2004, pp. 540–556
- Moise Blanchard, Junhui Zhang and Patrick Jaillet “Quadratic Memory is Necessary for Optimal Query Complexity in Convex Optimization: Center-of-Mass is Pareto-Optimal” In arXiv preprint arXiv:2302.04963, 2023
- “Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality” In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’16 Cambridge, MA, USA: Association for Computing Machinery, 2016, pp. 1011–1020 DOI: 10.1145/2897518.2897582
- C.G. Broyden “The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations” In IMA Journal of Applied Mathematics 6.1, 1970, pp. 76–90 DOI: 10.1093/imamat/6.1.76
- Sébastien Bubeck “Convex Optimization: Algorithms and Complexity” In Foundations and Trends® in Machine Learning 8.3-4, 2015, pp. 231–357 DOI: 10.1561/2200000050
- Sébastien Bubeck, Yin Tat Lee and Mohit Singh “A geometric alternative to Nesterov’s accelerated gradient descent”, 2015 arXiv:1506.08187 [math.OC]
- Alexandre d’Aspremont, Damien Scieur and Adrien Taylor “Acceleration Methods” In Foundations and Trends® in Optimization 5.1-2, 2021, pp. 1–245 DOI: 10.1561/2400000036
- Yuval Dagan, Gil Kur and Ohad Shamir “Space lower bounds for linear prediction in the streaming model” In Proceedings of the Thirty-Second Conference on Learning Theory 99, Proceedings of Machine Learning Research PMLR, 2019, pp. 929–954 URL: https://proceedings.mlr.press/v99/dagan19b.html
- “Detecting Correlations with Little Memory and Communication” In Proceedings of the 31st Conference On Learning Theory 75, Proceedings of Machine Learning Research PMLR, 2018, pp. 1145–1198 URL: https://proceedings.mlr.press/v75/dagan18a.html
- James Demmel, Ioana Dumitriu and Olga Holtz “Fast linear algebra is stable” In Numerische Mathematik 108.1 Springer, 2007, pp. 59–91
- “Optimized first-order methods for smooth convex minimization” In Mathematical Programming 159, 2014 DOI: 10.1007/s10107-015-0949-3
- “Performance of first-order methods for smooth convex minimization: a novel approach” In Mathematical Programming 145.1, 2014, pp. 451–482 DOI: 10.1007/s10107-013-0653-0
- Dmitriy Drusvyatskiy, Maryam Fazel and Scott Roy “An Optimal First Order Method Based on Optimal Quadratic Averaging” In SIAM Journal on Optimization 28.1, 2018, pp. 251–271 DOI: 10.1137/16M1072528
- “On the optimality of the random hyperplane rounding technique for MAX CUT” In Random Structures & Algorithms 20.3 Wiley Online Library, 2002, pp. 403–440
- R. Fletcher “A new approach to variable metric algorithms” In The Computer Journal 13.3, 1970, pp. 317–322 DOI: 10.1093/comjnl/13.3.317
- Donald Goldfarb “A Family of Variable-Metric Methods Derived by Variational Means” In Mathematics of Computation 24.109 American Mathematical Society, 1970, pp. 23–26 URL: http://www.jstor.org/stable/2004873
- “An Improved Cutting Plane Method for Convex Optimization, Convex-Concave Games, and Its Applications” In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020 Chicago, IL, USA: Association for Computing Machinery, 2020, pp. 944–953 DOI: 10.1145/3357713.3384284
- “Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales” In Proceedings of Thirty Fifth Conference on Learning Theory 178, Proceedings of Machine Learning Research PMLR, 2022, pp. 2431–2540 URL: https://proceedings.mlr.press/v178/kelner22a.html
- Guanghui Lan “Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization” In Mathematical Programming 149.1, 2015, pp. 1–45 DOI: 10.1007/s10107-013-0737-x
- Guanghui Lan, Soomin Lee and Yi Zhou “Communication-efficient algorithms for decentralized and stochastic optimization” In Mathematical Programming 180.1, 2020, pp. 237–284 DOI: 10.1007/s10107-018-1355-4
- Quoc Le, Alex Smola and S.V.N. Vishwanathan “Bundle Methods for Machine Learning” In Advances in Neural Information Processing Systems 20 Curran Associates, Inc., 2007 URL: https://proceedings.neurips.cc/paper_files/paper/2007/file/26337353b7962f533d78c762373b3318-Paper.pdf
- Yin Tat Lee, Aaron Sidford and Sam Chiu-wai Wong “A faster cutting plane method and its implications for combinatorial and convex optimization” In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, 2015, pp. 1049–1065 IEEE
- Claude Lemaréchal, Arkadi Nemirovski and Yurii Nesterov “New variants of bundle methods” In Mathematical Programming 69.1, 1995, pp. 111–147 DOI: 10.1007/BF01585555
- Adrian S. Lewis and Michael L. Overton “Nonsmooth optimization via quasi-Newton methods” In Mathematical Programming 141.1, 2013, pp. 135–163
- Dong C. Liu and Jorge Nocedal “On the limited memory BFGS method for large scale optimization” In Mathematical Programming 45.1, 1989, pp. 503–528
- “Efficient convex optimization requires superlinear memory” In Conference on Learning Theory, 2022, pp. 2390–2430 PMLR
- “Mixing Implies Lower Bounds for Space Bounded Learning” In Proceedings of the 2017 Conference on Learning Theory PMLR, 2017, pp. 1516–1566
- “D-ADMM: A Communication-Efficient Distributed Algorithm for Separable Optimization” In IEEE Transactions on Signal Processing 61.10, 2013, pp. 2718–2723 DOI: 10.1109/TSP.2013.2254478
- Arkadi Nemirovski and David Borisovich Yudin “Problem Complexity and Method Efficiency in Optimization”, A Wiley-Interscience publication Wiley, 1983
- Yurii Nesterov “A method of solving a convex programming problem with convergence rate O(1/k2)𝑂1superscript𝑘2O(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )” In Dokl. Akad. Nauk SSSR 269, 1983, pp. 543–547
- Yurii Nesterov “Introductory lectures on convex optimization: A basic course” Springer Science & Business Media, 2003
- Jorge Nocedal “Updating Quasi-Newton Matrices with Limited Storage” In Mathematics of Computation 35.151 American Mathematical Society, 1980, pp. 773–782
- Srinivasan Ramaswamy and John E Mitchell “A long step cutting plane algorithm that uses the volumetric barrier”, 1995
- Ran Raz “A Time-Space Lower Bound for a Large Class of Learning Problems” In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), 2017, pp. 732–742 DOI: 10.1109/FOCS.2017.73
- “AIDE: Fast and Communication Efficient Distributed Optimization” In ArXiv abs/1608.06879, 2016
- B.Van Scoy, R.A. Freeman and K.M. Lynch “The Fastest Known Globally Convergent First-Order Method for Minimizing Strongly Convex Functions” In IEEE Control Systems Letters PP.99, 2017, pp. 1–1 DOI: 10.1109/LCSYS.2017.2722406
- Ohad Shamir, Nati Srebro and Tong Zhang “Communication-Efficient Distributed Optimization using an Approximate Newton-type Method” In Proceedings of the 31st International Conference on Machine Learning 32.2, Proceedings of Machine Learning Research Bejing, China: PMLR, 2014, pp. 1000–1008 URL: https://proceedings.mlr.press/v32/shamir14.html
- D.F. Shanno “Conditioning of Quasi-Newton Methods for Function Minimization” In Mathematics of Computation 24.111 American Mathematical Society, 1970, pp. 647–656 URL: http://www.jstor.org/stable/2004840
- Vatsal Sharan, Aaron Sidford and Gregory Valiant “Memory-Sample Tradeoffs for Linear Regression with Small Error” In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019 Association for Computing Machinery, 2019, pp. 890–901
- “CoCoA: A General Framework for Communication-Efficient Distributed Optimization” In J. Mach. Learn. Res. 18.1 JMLR.org, 2017, pp. 8590–8638
- “An optimal gradient method for smooth strongly convex minimization” In Mathematical Programming 199.1, 2023, pp. 557–594 DOI: 10.1007/s10107-022-01839-y
- “Bundle Methods for Regularized Risk Minimization” In Journal of Machine Learning Research 11.10, 2010, pp. 311–365 URL: http://jmlr.org/papers/v11/teo10a.html
- Pravin M Vaidya “A new algorithm for minimizing convex functions over convex sets” In Mathematical programming 73.3 Springer, 1996, pp. 291–341
- Jialei Wang, Weiran Wang and Nathan Srebro “Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox” In Proceedings of the 2017 Conference on Learning Theory 65, Proceedings of Machine Learning Research PMLR, 2017, pp. 1882–1919 URL: https://proceedings.mlr.press/v65/wang17a.html
- “Gradient Sparsification for Communication-Efficient Distributed Optimization” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18 Montréal, Canada: Curran Associates Inc., 2018, pp. 1306–1316
- “Open problem: The oracle complexity of convex optimization with limited memory” In Conference on Learning Theory, 2019, pp. 3202–3210 PMLR
- Yuchen Zhang, John C. Duchi and Martin J. Wainwright “Communication-efficient algorithms for statistical optimization” In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 2012, pp. 6792–6792 DOI: 10.1109/CDC.2012.6426691
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.