Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Memory-Query Tradeoffs for Randomized Convex Optimization (2306.12534v1)

Published 21 Jun 2023 in cs.DS, cs.AI, cs.LG, and stat.ML

Abstract: We show that any randomized first-order algorithm which minimizes a $d$-dimensional, $1$-Lipschitz convex function over the unit ball must either use $\Omega(d{2-\delta})$ bits of memory or make $\Omega(d{1+\delta/6-o(1)})$ queries, for any constant $\delta\in (0,1)$ and when the precision $\epsilon$ is quasipolynomially small in $d$. Our result implies that cutting plane methods, which use $\tilde{O}(d2)$ bits of memory and $\tilde{O}(d)$ queries, are Pareto-optimal among randomized first-order algorithms, and quadratic memory is required to achieve optimal query complexity for convex optimization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Improved space bounds for learning with experts. https://arxiv.org/abs/2303.01453, 2023.
  2. A cutting plane algorithm for convex programming that uses analytic centers. Mathematical programming, 69(1-3):1–43, 1995.
  3. When is memorization of irrelevant training data necessary for high-accuracy learning? In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 123–132, 2021.
  4. Distributed learning, communication complexity and privacy. In Conference on Learning Theory, pages 26–1. JMLR Workshop and Conference Proceedings, 2012.
  5. Strong memory lower bounds for learning natural models. Conference on Learning Theory (COLT), 2022.
  6. Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 1011–1020, 2016.
  7. Lower bounds on the oracle complexity of nonsmooth convex optimization via information theory. IEEE Transactions on Information Theory, 63(7):4709–4724, 2017.
  8. The gradient complexity of linear regression. In Conference on Learning Theory, pages 627–647. PMLR, 2020.
  9. Memory-constrained algorithms for convex optimization via recursive cutting-planes. arXiv preprint arXiv:2306.10096, 2023.
  10. Complexity of highly parallel non-smooth convex optimization. Advances in neural information processing systems, 32, 2019.
  11. Parallelization does not accelerate convex optimization: Adaptivity lower bounds for non-smooth convex minimization. arXiv preprint arXiv:1808.03880, 2018.
  12. Sébastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
  13. Solving convex programs by random walks. Journal of the ACM (JACM), 51(4):540–556, 2004.
  14. Quadratic memory is necessary for optimal query complexity in convex optimization: Center-of-mass is pareto-optimal. arXiv preprint arXiv:2302.04963, 2023.
  15. Memory bounds for continual learning. In 2022 IEEE 63th Annual Symposium on Foundations of Computer Science (FOCS), 2022.
  16. Randomized smoothing for stochastic optimization. SIAM Journal on Optimization, 22(2):674–701, 2012.
  17. Lower bounds for parallel and randomized convex optimization. In Conference on Learning Theory, pages 1132–1157. PMLR, 2019.
  18. Gradient descent can take exponential time to escape saddle points. Advances in neural information processing systems, 30, 2017.
  19. Space lower bounds for linear prediction in the streaming model. In Conference on Learning Theory, pages 929–954, 2019.
  20. Detecting correlations with little memory and communication. In Conference On Learning Theory (COLT), 2018.
  21. Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 954–959, 2020.
  22. Function minimization by conjugate gradients. The computer journal, 7(2):149–154, 1964.
  23. Memory-sample lower bounds for learning parity with noise. In 24th International Conference on Approximation Algorithms for Combinatorial Optimization Problems (APPROX 2021) and 25th International Conference on Randomization and Computation (RANDOM 2021), 2021.
  24. Towards a combinatorial characterization of bounded-memory learning. Advances in Neural Information Processing Systems (NeurIPS), 2020.
  25. On communication cost of distributed statistical estimation and dimensionality. Advances in Neural Information Processing Systems, 27, 2014.
  26. Extractor-based time-space lower bounds for learning. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC), 2018.
  27. Time-space lower bounds for two-pass learning. In 34th Computational Complexity Conference (CCC), 2019.
  28. Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49(6):409–436, 1952.
  29. A survey of nonlinear conjugate gradient methods. Pacific journal of Optimization, 2(1):35–58, 2006.
  30. An improved cutting plane method for convex optimization, convex-concave games, and its applications. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 944–953, 2020.
  31. On nonconvex optimization for machine learning: Gradients, stochasticity, and saddle points. Journal of the ACM (JACM), 68(2):1–29, 2021.
  32. Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems, 26, 2013.
  33. Anatoly Yur’evich Levin. An algorithm for minimizing convex functions. In Doklady Akademii Nauk, volume 160, pages 1244–1247. Russian Academy of Sciences, 1965.
  34. On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1-3):503–528, 1989.
  35. Memory-sample lower bounds for learning with classical-quantum hybrid memory. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC), 2023.
  36. A faster cutting plane method and its implications for combinatorial and convex optimization. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 1049–1065. IEEE, 2015.
  37. Efficient convex optimization requires superlinear memory. In Conference on Learning Theory, pages 2390–2430. PMLR, 2022.
  38. Arkadi Nemirovski. On parallel complexity of nonsmooth convex optimization. Journal of Complexity, 10(4):451–463, 1994.
  39. Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2003.
  40. Jorge Nocedal. Updating quasi-newton matrices with limited storage. Mathematics of computation, 35(151):773–782, 1980.
  41. Problem complexity and method efficiency in optimization. 1983.
  42. Near optimal memory-regret tradeoff for online learning. arXiv preprint arXiv:2303.01673, 2023.
  43. Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM Journal on Optimization, 27(1):205–245, 2017.
  44. Online prediction in sub-linear space. In Proceedings of the Thirty Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, 2023.
  45. Ran Raz. A time-space lower bound for a large class of learning problems. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), 2017.
  46. Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. Journal of the ACM (JACM), 66(1):1–18, 2018.
  47. Sub-sampled newton methods. Mathematical Programming, 174:293–326, 2019.
  48. Minimax rates for memory-bounded sparse linear regression. In Conference on Learning Theory, pages 1564–1587. PMLR, 2015.
  49. Tight query complexity lower bounds for pca via finite sample deformed wigner law. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1249–1259, 2018.
  50. Memory-sample tradeoffs for linear regression with small error. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC), 2019.
  51. Memory, communication, and statistical queries. In Conference on Learning Theory (COLT), 2016.
  52. Memory bounds for the experts problem. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC), 2022.
  53. Pravin M Vaidya. A new algorithm for minimizing convex functions over convex sets. Mathematical programming, 73(3):291–341, 1996.
  54. Santosh Vempala. Geometric random walks: a survey. Combinatorial and computational geometry, 52(573-612):2, 2005.
  55. David Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10(1–2):1–157, 2014.
  56. Tight complexity bounds for optimizing composite objectives. Advances in neural information processing systems, 29, 2016.
  57. Open problem: The oracle complexity of convex optimization with limited memory. In Conference on Learning Theory, pages 3202–3210, 2019.
  58. Streaming algorithms for learning with experts: Deterministic versus robust. arXiv preprint arXiv:2303.01709, 2023.
Citations (6)

Summary

We haven't generated a summary for this paper yet.