Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
Gemini 2.5 Pro Pro
o3 Pro
GPT-4.1 Pro
DeepSeek R1 via Azure Pro
2000 character limit reached

A provable control of sensitivity of neural networks through a direct parameterization of the overall bi-Lipschitzness (2404.09821v2)

Published 15 Apr 2024 in cs.LG and stat.ML

Abstract: While neural networks can enjoy an outstanding flexibility and exhibit unprecedented performance, the mechanism behind their behavior is still not well-understood. To tackle this fundamental challenge, researchers have tried to restrict and manipulate some of their properties in order to gain new insights and better control on them. Especially, throughout the past few years, the concept of \emph{bi-Lipschitzness} has been proved as a beneficial inductive bias in many areas. However, due to its complexity, the design and control of bi-Lipschitz architectures are falling behind, and a model that is precisely designed for bi-Lipschitzness realizing a direct and simple control of the constants along with solid theoretical analysis is lacking. In this work, we investigate and propose a novel framework for bi-Lipschitzness that can achieve such a clear and tight control based on convex neural networks and the Legendre-Fenchel duality. Its desirable properties are illustrated with concrete experiments. We also apply this framework to uncertainty estimation and monotone problem settings to illustrate its broad range of applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Markov-Lipschitz deep learning. arXiv preprint arXiv:2006.08256, 2020.
  2. Invertible residual networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 573–582. PMLR, 09–15 Jun 2019.
  3. Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 7498–7512. Curran Associates, Inc., 2020.
  4. Uncertainty estimation using a single deep deterministic neural network. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 9690–9700. PMLR, 13–18 Jul 2020.
  5. Benchmarking invertible architectures on inverse problems. arXiv preprint arXiv:2101.10763, 2021.
  6. Understanding and mitigating exploding inverses in invertible neural networks. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 1792–1800. PMLR, 13–15 Apr 2021.
  7. Lipschitz regularity of deep neural networks: Analysis and efficient estimation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  8. Efficient and accurate estimation of Lipschitz constants for deep neural networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  9. Controlling posterior collapse by an inverse Lipschitz constraint on the decoder network. In Proceedings of the 40th International Conference on Machine Learning, Proceedings of Machine Learning Research. PMLR, 2023.
  10. Expressive monotonic neural networks. In International Conference on Learning Representations, 2023.
  11. Direct parameterization of Lipschitz-bounded deep networks. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 36093–36110. PMLR, 23–29 Jul 2023.
  12. Improved training of Wasserstein GANs. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  13. Wasserstein generative adversarial networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 214–223. PMLR, 06–11 Aug 2017.
  14. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
  15. A unified algebraic perspective on Lipschitz neural networks. International Conference on Learning Representations, 2023.
  16. Approximation capabilities of neural ODEs and invertible residual networks. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 11086–11095. PMLR, 13–18 Jul 2020.
  17. Coupling-based invertible neural networks are universal diffeomorphism approximators. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 3362–3373. Curran Associates, Inc., 2020.
  18. Sorting out Lipschitz function approximation. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 291–301. PMLR, 09–15 Jun 2019.
  19. K. Ball. An Elementary Introduction to Monotone Transportation, pages 41–52. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004. ISBN 978-3-540-44489-3. doi:10.1007/978-3-540-44489-3_5.
  20. Posterior collapse and latent variable non-identifiability. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 5443–5455. Curran Associates, Inc., 2021.
  21. Xingyu Zhou. On the Fenchel duality between strong convexity and Lipschitz continuous gradient. arXiv preprint arXiv:1803.06573, 2018.
  22. Input convex neural networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 146–155. PMLR, 06–11 Aug 2017.
  23. Optimal control via neural networks: A convex approach. In International Conference on Learning Representations, 2019.
  24. Convex potential flows: Universal probability distributions with optimal transport and convex optimization. International Conference on Learning Representations, 2021.
  25. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 71–79, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.
  26. Potential-function proofs for first-order methods. arXiv preprint arXiv:1712.04581, 2017.
  27. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  28. Filippo Santambrogio. Optimal transport for applied mathematicians. Birkäuser, NY, 55(58–63):94, 2015.
  29. Almost-orthogonal layers for efficient general-purpose Lipschitz networks. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXI, page 350–365, Berlin, Heidelberg, 2022. Springer-Verlag. ISBN 978-3-031-19802-1. doi:10.1007/978-3-031-19803-8_21.
  30. Orthogonalizing convolutional layers with the cayley transform. International Conference on Learning Representations, 2021.
  31. Simple and scalable predictive uncertainty estimation using deep ensembles. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  32. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  33. MNIST handwritten digit database. 2010. URL http://yann.lecun.com/exdb/mnist/.
  34. Yaroslav Bulatov. notMNIST dataset. 2011. URL https://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html.
  35. Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. Propublica, 2016.
  36. Heart Disease. UCI Machine Learning Repository, 1988. DOI: https://doi.org/10.24432/C52P4X.
  37. R. Quinlan. Auto MPG. UCI Machine Learning Repository, 1993. DOI: https://doi.org/10.24432/C5859H.
  38. Parseval networks: Improving robustness to adversarial examples. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 854–863. PMLR, 06–11 Aug 2017.
  39. Regularisation of neural networks by enforcing Lipschitz continuity. Machine Learning, 110:393–416, 2021.
  40. Generalizable adversarial training via spectral normalization. International Conference on Learning Representations, 2019.
  41. Lot: Layer-wise orthogonal training on improving l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT certified robustness. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 18904–18915. Curran Associates, Inc., 2022.
  42. Controllable orthogonalization in training DNNs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6429–6438, 2020.
  43. Skew orthogonal convolutions. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 9756–9766. PMLR, 18–24 Jul 2021.
  44. Preventing gradient attenuation in Lipschitz constrained convolutional networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  45. Orthogonal convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  46. Constructing orthogonal convolutions in an explicit manner. In International Conference on Learning Representations, 2022.
  47. A dynamical system perspective for Lipschitz neural networks. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 15484–15500. PMLR, 17–23 Jul 2022.
  48. Excessive invariance causes adversarial vulnerability. International Conference on Learning Representations, 2019.
  49. Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach. arXiv preprint arXiv:1809.09505, 2018.
  50. R Tyrrell Rockafellar. Convex analysis, volume 11. Princeton university press, 1997.
  51. Ralph Rockafellar. On the maximal monotonicity of subdifferential mappings. Pacific Journal of Mathematics, 33(1):209–216, 1970.
  52. Sanjo Zlobec. On the liu–floudas convexification of smooth programs. Journal of Global Optimization, 32(3):401–407, 2005.
  53. Yurii Evgen’evich Nesterov. A method of solving a convex programming problem with convergence rate o(1/k2)1superscript𝑘2(1/k^{2})( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In Doklady Akademii Nauk, volume 269, pages 543–547. Russian Academy of Sciences, 1983.
  54. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(61):2121–2159, 2011.
  55. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on, 14(8):2, 2012.
  56. Adam: A method for stochastic optimization. International Conference on Learning Representations, 2015.
  57. Making gradient descent optimal for strongly convex stochastic optimization. 2012.
  58. Radford M Neal. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
  59. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, 20–22 Jun 2016. PMLR.
  60. Neural discrete representation learning. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  61. Stability-certified reinforcement learning: A control-theoretic perspective. IEEE Access, 8:229086–229100, 2020.
  62. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019. ISSN 0021-9991. doi:https://doi.org/10.1016/j.jcp.2018.10.045.
  63. Rethinking Lipschitz neural networks and certified robustness: A boolean function perspective. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 19398–19413. Curran Associates, Inc., 2022.
  64. On the equivalence of weak learnability and linear separability: New relaxations and efficient boosting algorithms. Machine learning, 80:141–163, 2010.
  65. Strongly convex squared norms. Dolomites Research Notes on Approximation, 16(2), 2023.
  66. Understanding the difficulty of training deep feedforward neural networks. In Yee Whye Teh and Mike Titterington, editors, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 249–256, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR.
  67. Spectrally-normalized margin bounds for neural networks. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com