Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time (2402.03625v3)

Published 6 Feb 2024 in cs.LG and math.OC

Abstract: In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of O(log n0.5), where n is the number of training samples. A simple application leads to a tractable polynomial-time algorithm that is guaranteed to solve the original non-convex problem up to a logarithmic factor. Moreover, under mild assumptions, we show that local gradient methods converge to a point with low training loss with high probability. Our result is an exponential improvement compared to existing results and sheds new light on understanding why local gradient methods work well.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Yann LeCun, Yoshua Bengio and Geoffrey Hinton “Deep learning” In nature 521.7553 Nature Publishing Group UK London, 2015, pp. 436–444
  2. “Visualizing the loss landscape of neural nets” In Advances in neural information processing systems 31, 2018
  3. Léon Bottou “Large-scale machine learning with stochastic gradient descent” In Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22-27, 2010 Keynote, Invited and Contributed Papers, 2010, pp. 177–186 Springer
  4. Diederik P Kingma and Jimmy Ba “Adam: A method for stochastic optimization” In arXiv preprint arXiv:1412.6980, 2014
  5. Arthur Jacot, Franck Gabriel and Clément Hongler “Neural tangent kernel: Convergence and generalization in neural networks” In Advances in neural information processing systems 31, 2018
  6. “Gradient descent finds global minima of deep neural networks” In International conference on machine learning, 2019, pp. 1675–1685 PMLR
  7. “Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks” In International Conference on Machine Learning, 2019, pp. 322–332 PMLR
  8. “Gradient descent optimizes over-parameterized deep ReLU networks” In Machine learning 109 Springer, 2020, pp. 467–492
  9. “Stochastic gradient descent optimizes over-parameterized deep relu networks” In arXiv preprint arXiv:1811.08888, 2018
  10. “Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow relu networks” In arXiv preprint arXiv:1909.12292, 2019
  11. Digvijay Boob, Santanu S Dey and Guanghui Lan “Complexity of training relu neural network” In Discrete Optimization 44 Elsevier, 2022, pp. 100620
  12. “Neural networks are convex regularizers: Exact polynomial-time convex optimization formulations for two-layer networks” In International Conference on Machine Learning, 2020, pp. 7695–7705 PMLR
  13. Yifei Wang, Tolga Ergen and Mert Pilanci “Parallel Deep Neural Networks Have Zero Duality Gap” In The Eleventh International Conference on Learning Representations, 2022
  14. Yifei Wang, Jonathan Lacotte and Mert Pilanci “The hidden convex optimization landscape of regularized two-layer relu networks: an exact characterization of optimal solutions” In International Conference on Learning Representations, 2021
  15. “Learning overparameterized neural networks via stochastic gradient descent on structured data” In Advances in neural information processing systems 31, 2018
  16. “Implicit convex regularizers of cnn architectures: Convex optimization of two-and three-layer networks in polynomial time” In arXiv preprint arXiv:2006.14798, 2020
  17. “Unraveling attention via convex duality: Analysis and interpretations of vision transformers” In International Conference on Machine Learning, 2022, pp. 19050–19088 PMLR
  18. “Global optimality beyond two layers: Training deep relu networks via convex programs” In International Conference on Machine Learning, 2021, pp. 2993–3003 PMLR
  19. “Vector-output relu neural network problems are copositive programs: Convex analysis of two layer networks and polynomial-time algorithms” In arXiv preprint arXiv:2012.13329, 2020
  20. “Polynomial-Time Solutions for ReLU Network Training: A Complexity Classification via Max-Cut and Zonotopes” In arXiv preprint arXiv:2311.10972, 2023
  21. “Tight hardness results for training depth-2 ReLU networks” In arXiv preprint arXiv:2011.13550, 2020
  22. Santanu S Dey, Guanyi Wang and Yao Xie “Approximation algorithms for training one-node ReLU neural networks” In IEEE Transactions on Signal Processing 68 IEEE, 2020, pp. 6696–6706
  23. “Understanding deep neural networks with rectified linear units” In arXiv preprint arXiv:1611.01491, 2016
  24. “The computational complexity of training relu (s)” In arXiv preprint arXiv:1810.04207, 2018
  25. Ainesh Bakshi, Rajesh Jayaram and David P Woodruff “Learning two layer rectified neural networks in polynomial time” In Conference on Learning Theory, 2019, pp. 195–268 PMLR
  26. Pranjal Awasthi, Alex Tang and Aravindan Vijayaraghavan “Efficient algorithms for learning depth-2 neural networks with general relu activations” In Advances in Neural Information Processing Systems 34, 2021, pp. 13485–13496
  27. Yatong Bai, Tanmay Gautam and Somayeh Sojoudi “Efficient global optimization of two-layer relu networks: Quadratic-time algorithms and adversarial training” In SIAM Journal on Mathematics of Data Science 5.2 SIAM, 2023, pp. 446–474
  28. “No bad local minima: Data independent training error guarantees for multilayer neural networks” In arXiv preprint arXiv:1605.08361, 2016
  29. Thomas M. Cover “Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition” In IEEE Transactions on Electronic Computers EC-14.3, 1965, pp. 326–334 DOI: 10.1109/PGEC.1965.264137
  30. “The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime” In arXiv preprint arXiv:1911.01544, 2019
  31. Michael Celentano, Chen Cheng and Andrea Montanari “The high-dimensional asymptotics of first order methods with random data” In arXiv preprint arXiv:2112.07572, 2021
  32. “Fundamental barriers to high-dimensional regression with convex penalties” In The Annals of Statistics 50.1 Institute of Mathematical Statistics, 2022, pp. 170–196
  33. Florian A Potra and Stephen J Wright “Interior-point methods” In Journal of computational and applied mathematics 124.1-2 Elsevier, 2000, pp. 281–302
  34. Aaron Mishkin, Arda Sahiner and Mert Pilanci “Fast convex optimization for two-layer relu networks: Equivalent model classes and cone decompositions” In International Conference on Machine Learning, 2022, pp. 15770–15816 PMLR
  35. Stephen P Boyd and Lieven Vandenberghe “Convex optimization” Cambridge university press, 2004
  36. Joel A Tropp “An introduction to matrix concentration inequalities” In Foundations and Trends® in Machine Learning 8.1-2 Now Publishers, Inc., 2015, pp. 1–230
  37. Nick Harvey “Lecture 2: Matrix Chernoff bounds”
  38. Christos Thrampoulidis, Samet Oymak and Babak Hassibi “The Gaussian min-max theorem in the presence of convexity” In arXiv preprint arXiv:1408.4837, 2014
  39. Noureddine El Karoui “The spectrum of kernel random matrices”, 2010
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com