Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Class-wise Generalization Error: an Information-Theoretic Analysis (2401.02904v1)

Published 5 Jan 2024 in cs.LG and stat.ML

Abstract: Existing generalization theories of supervised learning typically take a holistic approach and provide bounds for the expected generalization over the whole data distribution, which implicitly assumes that the model generalizes similarly for all the classes. In practice, however, there are significant variations in generalization performance among different classes, which cannot be captured by the existing generalization bounds. In this work, we tackle this problem by theoretically studying the class-generalization error, which quantifies the generalization performance of each individual class. We derive a novel information-theoretic bound for class-generalization error using the KL divergence, and we further obtain several tighter bounds using the conditional mutual information (CMI), which are significantly easier to estimate in practice. We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior. Moreover, we show that the theoretical tools developed in this paper can be applied in several applications beyond this context.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Ibrahim Alabdulmohsin. Towards a unified theory of learning and information. Entropy, 22(4):438, 2020.
  2. An exact characterization of the generalization error for the gibbs algorithm. Advances in Neural Information Processing Systems, 34:8106–8118, 2021.
  3. A closer look at memorization in deep networks. In International conference on machine learning, pp.  233–242. PMLR, 2017.
  4. The effects of regularization and data augmentation are class dependent. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  37878–37891. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/f73c04538a5e1cad40ba5586b4b517d3-Paper-Conference.pdf.
  5. Fairness in machine learning. Nips tutorial, 1:2017, 2017.
  6. Classifiers should do well even on their worst classes. In ICML 2022 Shift Happens Workshop, 2022.
  7. Algorithmic stability and generalization performance. Advances in Neural Information Processing Systems, 13, 2000.
  8. Tightening mutual information-based bounds on generalization error. IEEE Journal on Selected Areas in Information Theory, 1(1):121–130, 2020.
  9. Satrajit Chatterjee. Learning and memorization. In International conference on machine learning, pp.  755–763. PMLR, 2018.
  10. Label-aware neural tangent kernel: Toward better generalization and local elasticity. Advances in Neural Information Processing Systems, 33:15847–15858, 2020.
  11. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  12. Toward better generalization bounds with locally elastic stability. In International Conference on Machine Learning, pp.  2590–2600. PMLR, 2021.
  13. Robert G Gallager. Information theory and reliable communication, volume 588. Springer, 1968.
  14. Deep learning, volume 1. MIT Press, 2016.
  15. Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning, pp.  1225–1234. PMLR, 2016.
  16. Information-theoretic generalization bounds for black-box learning algorithms. Advances in Neural Information Processing Systems, 34:24670–24682, 2021.
  17. Nearly-tight vc-dimension bounds for piecewise linear neural networks. In Conference on learning theory, pp.  1064–1068. PMLR, 2017.
  18. Recent advances in deep learning theory. arXiv preprint arXiv:2012.10931, 2020.
  19. The local elasticity of neural networks. In International Conference on Learning Representations, 2020.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  21. A new family of generalization bounds using samplewise evaluated cmi. Advances in Neural Information Processing Systems, 35:10108–10121, 2022.
  22. Wassily Hoeffding. Probability inequalities for sums of bounded random variables. The collected works of Wassily Hoeffding, pp.  409–426, 1994.
  23. Improving fairness in machine learning systems: What do industry practitioners need? In Proceedings of the 2019 CHI conference on human factors in computing systems, pp.  1–16, 2019.
  24. Generalization in deep learning. arXiv preprint arXiv:1710.05468, 1(8), 2017.
  25. Robustness implies generalization via data-dependent generalization bounds. In International Conference on Machine Learning, pp.  10866–10894. PMLR, 2022.
  26. How does information bottleneck help deep learning? arXiv preprint arXiv:2305.18887, 2023.
  27. Understanding the class-specific effects of data augmentations. In ICLR 2023 Workshop on Pitfalls of limited data and computation for Trustworthy ML, 2023.
  28. Learning multiple layers of features from tiny images. 2009.
  29. Dropmix: Reducing class dependency in mixed sample data augmentation. arXiv preprint arXiv:2307.09136, 2023.
  30. A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6):1–35, 2021.
  31. Rényi divergence based bounds on generalization error. In 2021 IEEE Information Theory Workshop (ITW), pp.  1–6. IEEE, 2021.
  32. Information-theoretic generalization bounds for stochastic gradient descent. In Conference on Learning Theory, pp.  3526–3545. PMLR, 2021.
  33. Exploring generalization in deep learning. Advances in neural information processing systems, 30, 2017.
  34. Ensuring fairness in machine learning to advance health equity. Annals of internal medicine, 169(12):866–872, 2018.
  35. Frontmatter, pp.  i–iv. Cambridge University Press, 2022.
  36. Tighter expected generalization error bounds via wasserstein distance. Advances in Neural Information Processing Systems, 34:19109–19121, 2021.
  37. On the information bottleneck theory of deep learning. Journal of Statistical Mechanics: Theory and Experiment, 2019(12):124020, 2019.
  38. Claude Elwood Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
  39. Beyond h-divergence: Domain adaptation theory with jensen-shannon divergence. arXiv preprint arXiv:2007.15567, 6, 2020.
  40. Eduardo D Sontag et al. Vc dimension of neural networks. NATO ASI Series F Computer and Systems Sciences, 168:69–96, 1998.
  41. Reasoning about generalization via conditional mutual information. In Conference on Learning Theory, pp.  3437–3452. PMLR, 2020.
  42. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pp.  1–5. IEEE, 2015.
  43. The information bottleneck method. arXiv preprint physics/0004057, 2000.
  44. Generalization bounds for noisy iterative algorithms using properties of additive noise channels. J. Mach. Learn. Res., 24:26–1, 2023.
  45. Information-theoretic bounds on model selection for gaussian markov random fields. In 2010 IEEE International Symposium on Information Theory, pp.  1373–1377. IEEE, 2010.
  46. On the generalization of models trained with sgd: Information-theoretic bounds and implications. arXiv preprint arXiv:2110.03128, 2021.
  47. Information-theoretic analysis of unsupervised domain adaptation. arXiv preprint arXiv:2210.00706, 2022.
  48. Tighter information-theoretic generalization bounds from supersamples. arXiv preprint arXiv:2302.02432, 2023.
  49. Fairness risk measures. In International conference on machine learning, pp.  6786–6797. PMLR, 2019.
  50. Information-theoretic analysis for transfer learning. In 2020 IEEE International Symposium on Information Theory (ISIT), pp.  2819–2824. IEEE, 2020.
  51. Information-theoretic analysis of generalization capability of learning algorithms. Advances in Neural Information Processing Systems, 30, 2017.
  52. Robustness and generalization. Machine learning, 86:391–423, 2012.
  53. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations, 2016.
  54. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
  55. Individually conditional individual mutual information bound on generalization error. IEEE Transactions on Information Theory, 68(5):3304–3316, 2022.
  56. Exactly tight information-theoretic generalization error bound for the quadratic gaussian problem. arXiv preprint arXiv:2305.00876, 2023.

Summary

We haven't generated a summary for this paper yet.