Optimal Sets and Solution Paths of ReLU Networks (2306.00119v2)
Abstract: We develop an analytical framework to characterize the set of optimal ReLU neural networks by reformulating the non-convex training problem as a convex program. We show that the global optima of the convex parameterization are given by a polyhedral set and then extend this characterization to the optimal set of the non-convex training objective. Since all stationary points of the ReLU training problem can be represented as optima of sub-sampled convex programs, our work provides a general expression for all critical points of the non-convex objective. We then leverage our results to provide an optimal pruning algorithm for computing minimal networks, establish conditions for the regularization path of ReLU networks to be continuous, and develop sensitivity results for minimal ReLU networks.
- Differentiable convex optimization layers. Advances in neural information processing systems, 32, 2019.
- ApS, M. Mosek optimizer API for python. Version, 9(17):6–4, 2022.
- Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782, 2017.
- Berge, C. Topological Spaces: including a treatment of multi-valued functions, vector spaces, and convexity. Courier Corporation, 1997.
- Bertsekas, D. Convex optimization theory, volume 1. Athena Scientific, 2009.
- What is the state of neural network pruning? In Dhillon, I. S., Papailiopoulos, D. S., and Sze, V. (eds.), Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020. mlsys.org, 2020.
- Convex Optimization. Cambridge University Press, 2014.
- Clarke, F. H. Optimization and nonsmooth analysis. SIAM, 1990.
- Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res., 15(1):3133–3181, 2014.
- CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
- Essentially no barriers in neural network energy landscape. In Dy, J. G. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pp. 1308–1317. PMLR, 2018.
- UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- Least angle regression. The Annals of statistics, 32(2):407–499, 2004.
- Global optimality beyond two layers: Training deep ReLU networks via convex programs. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp. 2993–3003. PMLR, 2021a.
- Implicit convex regularizers of CNN architectures: Convex optimization of two- and three-layer networks in polynomial time. In International Conference on Learning Representations: ICLR 2021, 2021b.
- Demystifying batch normalization in ReLU networks: Equivalent convex optimization models and implicit regularization. In International Conference on Learning Representations, 2021.
- Sensitivity and stability analysis for nonlinear programming. Annals of Operations Research, 27(1):215–235, 1990.
- Decoupling gating from linearity. arXiv preprint arXiv:1906.05032, 2019.
- Loss surfaces, mode connectivity, and fast ensembling of DNNs. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp. 8803–8812, 2018.
- 4. theory of linear programming. In Linear Inequalities and Related Systems.(AM-38), Volume 38, pp. 53–98. Princeton University Press, 2016.
- Implicit regularization in matrix factorization. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 6151–6159, 2017.
- Exact and relaxed convex formulations for shallow neural autoregressive models. In International Conference on Acoustics, Speech, and Signal Processing, 2021.
- Forward stagewise regression and the monotone lasso. Electronic Journal of Statistics, 1:1–29, 2007.
- Hogan, W. W. Point-to-set maps in mathematical programming. SIAM review, 15(3):591–603, 1973.
- Learning multiple layers of features from tiny images. 2009.
- Explaining landscape connectivity of low-cost solutions for multilayer nets. In Wallach, H. M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E. B., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 14574–14583, 2019.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Fast convex optimization for two-layer ReLU networks: Equivalent model classes and cone decompositions. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S. (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp. 15770–15816. PMLR, 2022.
- Nesterov, Y. et al. Lectures on convex optimization, volume 137. Springer, 2018.
- In search of the real inductive bias: On the role of implicit regularization in deep learning. In Bengio, Y. and LeCun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015.
- Nguyen, Q. On connected sublevel sets in deep learning. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pp. 4790–4799. PMLR, 2019.
- A new approach to variable selection in least squares problems. IMA journal of numerical analysis, 20(3):389–403, 2000.
- Neural networks are convex regularizers: Exact polynomial-time convex optimization formulations for two-layer networks. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, volume 119 of Proceedings of Machine Learning Research, pp. 7695–7705. PMLR, 2020.
- A sufficient condition for continuity of optimal sets in mathematical programming. Journal of Mathematical Analysis and Applications, 45(2):506–511, 1974.
- The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In Cohen, W. W., McCallum, A., and Roweis, S. T. (eds.), Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5-9, 2008, volume 307 of ACM International Conference Proceeding Series, pp. 848–855. ACM, 2008.
- Vector-output ReLU neural network problems are copositive programs: Convex analysis of two layer networks and polynomial-time algorithms. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Stein, C. M. Estimation of the mean of a multivariate normal distribution. The annals of Statistics, pp. 1135–1151, 1981.
- Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
- Tibshirani, R. J. The lasso problem and uniqueness. Electronic Journal of statistics, 7:1456–1490, 2013.
- The solution path of the generalized lasso. The annals of statistics, 39(3):1335–1371, 2011.
- Strong duality and minimal representations for cone optimization. Computational optimization and applications, 53(2):619–648, 2012.
- The degrees of freedom of the group lasso for a general design. CoRR, abs/1212.6478, 2012.
- The hidden convex optimization landscape of regularized two-layer ReLU networks: an exact characterization of optimal solutions. In International Conference on Learning Representations, 2021.
- Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67, 2006.