Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models (2403.08408v1)
Abstract: The generalization performance of deep neural networks in classification tasks is a major concern in machine learning research. Despite widespread techniques used to diminish the over-fitting issue such as data augmentation, pseudo-labeling, regularization, and ensemble learning, this performance still needs to be enhanced with other approaches. In recent years, it has been theoretically demonstrated that the loss function characteristics i.e. its Lipschitzness and maximum value affect the generalization performance of deep neural networks which can be utilized as a guidance to propose novel distance measures. In this paper, by analyzing the aforementioned characteristics, we introduce a distance called Reduced Jeffries-Matusita as a loss function for training deep classification models to reduce the over-fitting issue. In our experiments, we evaluate the new loss function in two different problems: image classification in computer vision and node classification in the context of graph learning. The results show that the new distance measure stabilizes the training process significantly, enhances the generalization ability, and improves the performance of the models in the Accuracy and F1-score metrics, even if the training set size is small.
- Z. Zhang, B. Chen, J. Sun, and Y. Luo, “A bagging dynamic deep learning network for diagnosing covid-19,” Scientific Reports, vol. 11, no. 1, p. 16280, 2021.
- C. F. G. D. Santos and J. P. Papa, “Avoiding overfitting: A survey on regularization methods for convolutional neural networks,” ACM Computing Surveys (CSUR), vol. 54, no. 10s, pp. 1–25, 2022.
- W. Zhang, Z. Yin, Z. Sheng, Y. Li, W. Ouyang, X. Li, Y. Tao, Z. Yang, and B. Cui, “Graph attention multi-layer perceptron,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 4560–4570, 2022.
- A. Akbari, M. Awais, M. Bashar, and J. Kittler, “How does loss function affect generalization performance of deep learning? application to human age estimation,” in International Conference on Machine Learning, pp. 141–151, PMLR, 2021.
- M. Lashkari and A. Gheibi, “Lipschitzness effect of a loss function on generalization performance of deep neural networks trained by adam and adamw optimizers,” AUT Journal of Mathematics and Computing, 2023. URL https://doi.org/10.22060/ajmc.2023.22182.1139.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
- T. Zahavy, B. Kang, A. Sivak, J. Feng, H. Xu, and S. Mannor, “Ensemble robustness and generalization of stochastic deep learning algorithms,” arXiv preprint arXiv:1602.02389, 2016.
- M. Ren, W. Zeng, B. Yang, and R. Urtasun, “Learning to reweight examples for robust deep learning,” in International conference on machine learning, pp. 4334–4343, PMLR, 2018.
- B. Neyshabur, S. Bhojanapalli, and N. Srebro, “A pac-bayesian approach to spectrally-normalized margin bounds for neural networks,” in International Conference on Learning Representations, 2018.
- J. Guan and Z. Lu, “Fast-rate pac-bayesian generalization bounds for meta-learning,” in International Conference on Machine Learning, pp. 7930–7948, PMLR, 2022.
- F. Scarselli, A. C. Tsoi, and M. Hagenbuchner, “The vapnik–chervonenkis dimension of graph and recursive neural networks,” Neural Networks, vol. 108, pp. 248–259, 2018.
- S. Basu, S. Mukhopadhyay, M. Karki, R. DiBiano, S. Ganguly, R. Nemani, and S. Gayaka, “Deep neural networks for texture classification—a theoretical analysis,” Neural Networks, vol. 97, pp. 173–182, 2018.
- N. Harvey, C. Liaw, and A. Mehrabian, “Nearly-tight vc-dimension bounds for piecewise linear neural networks,” in Conference on learning theory, pp. 1064–1068, PMLR, 2017.
- O. Bousquet and A. Elisseeff, “Stability and generalization,” The Journal of Machine Learning Research, vol. 2, pp. 499–526, 2002.
- S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan, “Learnability, stability and uniform convergence,” The Journal of Machine Learning Research, vol. 11, pp. 2635–2670, 2010.
- M. Hardt, B. Recht, and Y. Singer, “Train faster, generalize better: Stability of stochastic gradient descent,” in International conference on machine learning, pp. 1225–1234, PMLR, 2016.
- H. Xu and S. Mannor, “Robustness and generalization,” Machine learning, vol. 86, pp. 391–423, 2012.
- E. Hazan, K. Levy, and S. Shalev-Shwartz, “Beyond convexity: Stochastic quasi-convex optimization,” Advances in neural information processing systems, vol. 28, 2015.
- T. Tieleman, G. Hinton, et al., “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26–31, 2012.
- J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization.,” Journal of machine learning research, vol. 12, no. 7, 2011.
- K. Matusita, “Decision rules, based on the distance, for problems of fit, two samples, and estimation,” The Annals of Mathematical Statistics, pp. 631–640, 1955.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018.
- O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition,” in Proceedings of the British Machine Vision Conference (BMVC) (M. W. J. Xianghua Xie and G. K. L. Tam, eds.), pp. 41.1–41.12, BMVA Press, September 2015.
- P. Bansal, “Intel image classification,” 2018. https://www.kaggle.com/datasets/puneet6060/intel-image-classification?resource=download [Accessed: 12-10-2023].
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
- W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in neural information processing systems, vol. 30, 2017.
- P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
- Z. Yang, W. W. Cohen, and R. Salakhutdinov, “Revisiting semi-supervised learning with graph embeddings,” CoRR, vol. abs/1603.08861, 2016.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.