Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 35 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 85 tok/s
GPT OSS 120B 468 tok/s Pro
Kimi K2 203 tok/s Pro
2000 character limit reached

Unveiling the Hessian's Connection to the Decision Boundary (2306.07104v1)

Published 12 Jun 2023 in cs.LG, cond-mat.dis-nn, and stat.ML

Abstract: Understanding the properties of well-generalizing minima is at the heart of deep learning research. On the one hand, the generalization of neural networks has been connected to the decision boundary complexity, which is hard to study in the high-dimensional input space. Conversely, the flatness of a minimum has become a controversial proxy for generalization. In this work, we provide the missing link between the two approaches and show that the Hessian top eigenvectors characterize the decision boundary learned by the neural network. Notably, the number of outliers in the Hessian spectrum is proportional to the complexity of the decision boundary. Based on this finding, we provide a new and straightforward approach to studying the complexity of a high-dimensional decision boundary; show that this connection naturally inspires a new generalization measure; and finally, we develop a novel margin estimation technique which, in combination with the generalization measure, precisely identifies minima with simple wide-margin boundaries. Overall, this analysis establishes the connection between the Hessian and the decision boundary and provides a new method to identify minima with simple wide-margin decision boundaries.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Second-order stochastic optimization for machine learning in linear time. J. Mach. Learn. Res., 18:1–40, 2017. ISSN 15337928.
  2. Negative eigenvalues of the Hessian in deep neural networks, 2018. URL https://openreview.net/forum?id=S1iiddyDG.
  3. A modern look at the relationship between sharpness and generalization, 2023. URL https://arxiv.org/abs/2302.07011.
  4. A closer look at memorization in deep networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.  233–242. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/arpit17a.html.
  5. Exponentially many local minima for single neurons. In Touretzky, D., Mozer, M., and Hasselmo, M. (eds.), Advances in Neural Information Processing Systems, volume 8. MIT Press, 1995. URL https://proceedings.neurips.cc/paper/1995/file/3806734b256c27e41ec2c6bffa26d9e7-Paper.pdf.
  6. On the generalization mystery in deep learning, 2022. URL https://arxiv.org/abs/2203.10036.
  7. The loss surfaces of multilayer networks. In AISTATS 2015 - 18th Int. Conf. Artif. Intell. Stat., volume 38, pp.  192–204. PMLR, 2015. URL http://proceedings.mlr.press/v38/choromanska15.html.
  8. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In NIPS 2014 - Adv. Neural. Inf. Process. Syst., volume 27, pp.  2933–2941, 2014. URL https://papers.nips.cc/paper/2014/hash/17e23e50bedc63b4095e3d8204ce063b-Abstract.html.
  9. Hessian-based toolbox for reliable and interpretable machine learning in physics. Mach. Learn.: Sci. Technol., 3:015002, 2022. doi: 10.1088/2632-2153/ac338d. URL https://doi.org/10.1088/2632-2153/ac338d.
  10. Sharp minima can generalize for deep nets. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.  1019–1028. PMLR, 06–11 Aug 2017.
  11. Explanations can be manipulated and geometry is to blame. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/bb836c01cdc9120a9c984c525e4b1a4a-Paper.pdf.
  12. Towards robust explanations for deep neural networks. Pattern Recognition, 121:108194, 2022. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2021.108194. URL https://www.sciencedirect.com/science/article/pii/S0031320321003769.
  13. Empirical study of the topology and geometry of deep networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3762–3770, 2018. doi: 10.1109/CVPR.2018.00396.
  14. The activity-weight duality in feed forward neural networks: The geometric determinants of generalization, 2022.
  15. Emergent properties of the local geometry of neural loss landscapes. CoRR, abs/1910.05929, 2019. URL http://arxiv.org/abs/1910.05929.
  16. An investigation into neural net optimization via Hessian eigenvalue density. In ICML 2019 - 36th Int. Conf. Mach. Learn., volume 97, pp. 2232–2241. PMLR, 2019. ISBN 9781510886988. URL http://proceedings.mlr.press/v97/ghorbani19b.html.
  17. Efficient PyTorch Hessian eigendecomposition, October 2018. URL https://github.com/noahgolmant/pytorch-hessian-eigenthings.
  18. Analysis of generalizability of deep neural networks based on the complexity of decision boundary. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), pp.  101–106. IEEE, 2020.
  19. Gradient descent happens in a tiny subspace, 2019. URL https://openreview.net/forum?id=ByeTHsAqtX.
  20. Asymmetric valleys: Beyond sharp and flat local minima. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/01d8bae291b1e4724443375634ccfa0e-Paper.pdf.
  21. Flat minima. Neural Computation, 9(1):1–42, Jan 1997. ISSN 0899-7667. doi: 10.1162/neco.1997.9.1.1. URL https://doi.org/10.1162/neco.1997.9.1.1.
  22. Averaging weights leads to wider optima and better generalization. In UAI 2018 - 34th Conf. Uncertain. Artif. Intell., volume 2, pp.  876–885, 2018. ISBN 9781510871601. URL https://arxiv.org/abs/1803.05407v3.
  23. On the relation between the sharpest directions of DNN loss and the SGD step length. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SkgEaj05t7.
  24. Decision boundaries of deep neural networks. In 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), pp.  1085–1092, 2022. doi: 10.1109/ICMLA55696.2022.00179.
  25. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=H1oyRlYgg.
  26. Comparing complexities of decision boundaries for robust training: A universal approach. In Wang, L., Gall, J., Chin, T.-J., Sato, I., and Chellappa, R. (eds.), Computer Vision – ACCV 2022, pp.  627–645, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-26351-4.
  27. Understanding black-box predictions via influence functions. In ICML 2017 - 34th Int. Conf. Mach. Learn., volume 70, pp. 1885–1894. PMLR, 2017. URL http://proceedings.mlr.press/v70/koh17a.html.
  28. Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In International Conference on Machine Learning, pp. 5905–5914. PMLR, 2021.
  29. Optimal brain damage. In Touretzky, D. (ed.), Advances in Neural Information Processing Systems, volume 2. Morgan-Kaufmann, 1989. URL https://proceedings.neurips.cc/paper/1989/file/6c9882bbac1c7093bd25041881277658-Paper.pdf.
  30. Bad global minima exist and SGD can reach them. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  8543–8552. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/618491e20a9b686b79e158c293ab4f91-Paper.pdf.
  31. Detecting extrapolation with local ensembles. In ICLR 2020 - Int. Conf. Learn. Represent., 2020. URL https://openreview.net/forum?id=BJl6bANtwH.
  32. Robustness via curvature regularization, and vice versa. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  9070–9078, Los Alamitos, CA, USA, Jun 2019. IEEE Computer Society. doi: 10.1109/CVPR.2019.00929. URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2019.00929.
  33. SGD on neural networks learns functions of increasing complexity. In NeurIPS 2019 (spotlight), volume abs/1905.11604, 2019. URL http://arxiv.org/abs/1905.11604.
  34. Papyan, V. Measurements of three-level hierarchical structure in the outliers in the spectrum of deepnet hessians. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  5012–5021. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/papyan19a.html.
  35. Papyan, V. Traces of class/cross-class structure pervade deep learning spectra. J. Mach. Learn. Res., 21(1), 2020. ISSN 1532-4435. doi: 10.5555/3455716.3455968.
  36. Pytorch: An imperative style, high-performance deep learning library. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.
  37. Pearlmutter, B. A. Fast exact multiplication by the hessian. Neural Computation, 6:147–160, 1994. doi: 10.1162/neco.1994.6.1.147.
  38. Relative flatness and generalization. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=sygvo7ctb_.
  39. Adversarial robustness through local linearization. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/0defd533d51ed0a10c5c9dbf93ee78a5-Paper.pdf.
  40. Topological data analysis of decision boundaries with application to model selection, 2018.
  41. Eigenvalues of the Hessian in deep learning: Singularity and beyond, 2017. URL https://openreview.net/forum?id=B186cP9gx.
  42. Empirical analysis of the Hessian of over-parametrized neural networks. In ICLR 2018 - 6th Int. Conf. Learn. Represent., 2018. URL https://openreview.net/forum?id=rJrTwxbCb.
  43. The pitfalls of simplicity bias in neural networks. CoRR, abs/2006.07710, 2020. URL https://arxiv.org/abs/2006.07710.
  44. Efficiently training low-curvature neural networks, 2022. URL https://arxiv.org/abs/2206.07144.
  45. The marginal value of adaptive gradient methods in machine learning. Advances in neural information processing systems, 30, 2017.
  46. Towards understanding generalization of deep learning: Perspective of loss landscapes. arXiv:1706.10239, 2017. URL https://arxiv.org/abs/1706.10239.
  47. Hessian-aware pruning and optimal neural implant. CoRR, abs/2101.08940, 2021. URL https://arxiv.org/abs/2101.08940.
  48. Understanding deep learning (still) requires rethinking generalization. Commun. ACM, 64(3):107–115, Feb 2021. ISSN 0001-0782. doi: 10.1145/3446776. URL https://doi.org/10.1145/3446776.
  49. Theoretically principled trade-off between robustness and accuracy. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  7472–7482. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/zhang19p.html.
Citations (5)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.