Minimizing Chebyshev Prototype Risk Magically Mitigates the Perils of Overfitting (2404.07083v2)
Abstract: Overparameterized deep neural networks (DNNs), if not sufficiently regularized, are susceptible to overfitting their training examples and not generalizing well to test data. To discourage overfitting, researchers have developed multicomponent loss functions that reduce intra-class feature correlation and maximize inter-class feature distance in one or more layers of the network. By analyzing the penultimate feature layer activations output by a DNN's feature extraction section prior to the linear classifier, we find that modified forms of the intra-class feature covariance and inter-class prototype separation are key components of a fundamental Chebyshev upper bound on the probability of misclassification, which we designate the Chebyshev Prototype Risk (CPR). While previous approaches' covariance loss terms scale quadratically with the number of network features, our CPR bound indicates that an approximate covariance loss in log-linear time is sufficient to reduce the bound and is scalable to large architectures. We implement the terms of the CPR bound into our Explicit CPR (exCPR) loss function and observe from empirical results on multiple datasets and network architectures that our training algorithm reduces overfitting and improves upon previous approaches in many settings. Our code is available at https://github.com/Deano1718/Regularization_exCPR .
- A. Papoulis. Probability, random variables, and stochastic processes. McGraw-Hill, 1991.
- Regularizing deep neural networks by enhancing diversity in feature extraction. IEEE Transactions on Neural Networks and Learning Systems, PP:1–12, 01 2019.
- Optimization methods for large-scale machine learning. SIAM Rev., 60(2):223–311, 2018.
- Concentration Inequalities - A Nonasymptotic Theory of Independence. Oxford University Press, 2013.
- An analysis of single-layer networks in unsupervised feature learning. In International Conference on Artificial Intelligence and Statistics, 2011.
- Reducing overfitting in deep networks by decorrelating representations. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
- Arcface: Additive angular margin loss for deep face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):5962–5979, October 2022.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770–778. IEEE Computer Society, 2016.
- G E Hinton. Learning translation invariant recognition in massively parallel networks. In Volume I: Parallel Architectures on PARLE: Parallel Architectures and Languages Europe, page 1–13, Berlin, Heidelberg, 1987. Springer-Verlag.
- Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.
- Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 3271–3278. AAAI Press, 2018.
- Cut your losses with squentropy. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 14114–14131. PMLR, 2023.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Francis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 448–456. JMLR.org, 2015.
- Deep feature space: A geometrical perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6823–6838, oct 2022.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. pages 32–33, 2009.
- Large-margin softmax loss for convolutional neural networks. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 507–516. JMLR.org, 2016.
- Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid, editors, Computer Vision – ECCV 2012, pages 488–501, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
- Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40):24652–24663, September 2020.
- Regularizing neural networks by penalizing confident output distributions. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net, 2017.
- Regularizing cnns with locally constrained decorrelations. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
- Sheldon M. Ross. Introduction to Probability Models. Academic Press, 9th edition, 2007.
- Mohamed El Amine Seddik and Mohamed Tamaazousti. Neural networks classify through the class-wise means of their representations. Proceedings of the AAAI Conference on Artificial Intelligence, 36(8):8204–8211, Jun. 2022.
- Regularization of neural networks using dropconnect. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, volume 28 of JMLR Workshop and Conference Proceedings, pages 1058–1066. JMLR.org, 2013.
- Rethinking bias-variance trade-off for generalization of neural networks. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 10767–10777. PMLR, 2020.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.