Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness (2310.06161v1)

Published 9 Oct 2023 in cs.LG and stat.ML

Abstract: Neural networks (NNs) are known to exhibit simplicity bias where they tend to prefer learning 'simple' features over more 'complex' ones, even when the latter may be more informative. Simplicity bias can lead to the model making biased predictions which have poor out-of-distribution (OOD) generalization. To address this, we propose a framework that encourages the model to use a more diverse set of features to make predictions. We first train a simple model, and then regularize the conditional mutual information with respect to it to obtain the final model. We demonstrate the effectiveness of this framework in various problem settings and real-world applications, showing that it effectively addresses simplicity bias and leads to more features being used, enhances OOD generalization, and improves subgroup robustness and fairness. We complement these results with theoretical analyses of the effect of the regularization and its OOD generalization properties.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Invariant risk minimization, 2020.
  2. A closer look at memorization in deep networks. In International conference on machine learning, pages 233–242. PMLR, 2017.
  3. BLOOD: Bi-level learning framework for out-of-distribution generalization, 2022. URL https://openreview.net/forum?id=Cm08egNmrl3.
  4. Learning de-biased representations with biased representations. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 528–539. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/bahng20a.html.
  5. Predict then interpolate: A simple algorithm to learn stable classifiers. In International Conference on Machine Learning. PMLR, 2021.
  6. Nuanced metrics for measuring unintended bias with real data for text classification. In Companion Proceedings of The 2019 World Wide Web Conference, 2019.
  7. From detection of individual metastases to classification of lymph node status at the patient level: The camelyon17 challenge. IEEE Transactions on Medical Imaging, 38(2):550–560, 2019. doi: 10.1109/TMI.2018.2867350.
  8. Invariant rationalization. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1448–1458. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/chang20c.html.
  9. Social norm bias: residual harms of fairness-aware algorithms. Data Mining and Knowledge Discovery, pages 1–27, 2023.
  10. Environment inference for invariant learning. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 2189–2200. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/creager21a.html.
  11. A too-good-to-be-true prior to reduce shortcut reliance. Pattern Recogn. Lett., 166(C):164–171, feb 2023. ISSN 0167-8655. doi: 10.1016/j.patrec.2022.12.010. URL https://doi.org/10.1016/j.patrec.2022.12.010.
  12. Bias in bios: A case study of semantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 120–128, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450361255. doi: 10.1145/3287560.3287572. URL https://doi.org/10.1145/3287560.3287572.
  13. Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  14. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  15. Measuring and mitigating unintended bias in text classification. 2018.
  16. Distributionally robust losses against mixture covariate shifts, 2019.
  17. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, page 214–226, New York, NY, USA, 2012. Association for Computing Machinery. ISBN 9781450311151. doi: 10.1145/2090236.2090255. URL https://doi.org/10.1145/2090236.2090255.
  18. Decoupled classifiers for group-fair and efficient machine learning. In Sorelle A. Friedler and Christo Wilson, editors, Proceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 of Proceedings of Machine Learning Research, pages 119–133. PMLR, 23–24 Feb 2018. URL https://proceedings.mlr.press/v81/dwork18a.html.
  19. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJl-b3RcF7.
  20. Out-of-distribution robustness via targeted augmentations. In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications, 2022. URL https://openreview.net/forum?id=Bcg0It4i1g.
  21. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
  22. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness, 2022.
  23. Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 107–112, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-2017. URL https://aclanthology.org/N18-2017.
  24. Equality of opportunity in supervised learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 3323–3331, Red Hook, NY, USA, 2016. Curran Associates Inc. ISBN 9781510838819.
  25. Fairness without demographics in repeated loss minimization. In International Conference on Machine Learning, 2018.
  26. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. doi: 10.1109/CVPR.2016.90.
  27. Multicalibration: Calibration for the (Computationally-identifiable) masses. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1939–1948. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/hebert-johnson18a.html.
  28. Mitigating gender bias amplification in distribution by posterior regularization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2936–2942, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.264. URL https://aclanthology.org/2020.acl-main.264.
  29. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst., 33(1):1–33, oct 2012. ISSN 0219-1377. doi: 10.1007/s10115-011-0463-8. URL https://doi.org/10.1007/s10115-011-0463-8.
  30. Last layer re-training is sufficient for robustness to spurious correlations. In International Conference on Machine Learning, 2022.
  31. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021.
  32. Fairness without demographics through adversarially reweighted learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
  33. Diversify and disambiguate: Learning from underspecified data. In ICML 2022: Workshop on Spurious Correlations, Invariance and Stability, 2022. URL https://openreview.net/forum?id=lceGyKleset.
  34. mlbench: Machine Learning Benchmark Problems, 2021. R package version 2.1-3.1.
  35. Discover and mitigate unknown biases with debiasing alternate networks. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIII, page 270–288, Berlin, Heidelberg, 2022. Springer-Verlag. ISBN 978-3-031-19777-2. doi: 10.1007/978-3-031-19778-9_16. URL https://doi.org/10.1007/978-3-031-19778-9_16.
  36. Just train twice: Improving group robustness without training group information. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 6781–6792. PMLR, 18–24 Jul 2021a. URL https://proceedings.mlr.press/v139/liu21f.html.
  37. Heterogeneous risk minimization. In International Conference on Machine Learning, 2021b.
  38. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
  39. Simplicity bias in 1-hidden layer neural networks, 2023.
  40. SGD on Neural Networks Learns Functions of Increasing Complexity. Curran Associates Inc., Red Hook, NY, USA, 2019.
  41. Deep double descent: Where bigger models and more data hurt. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=B1g5sA4twr.
  42. Learning from failure: De-biasing classifier from biased classifier. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 20673–20684. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/eddc3427c5d77843c2253f1e799fe933-Paper.pdf.
  43. Uci repository of machine learning databases, 1998. URL http://www.ics.uci.edu/~mlearn/MLRepository.html.
  44. Agree to disagree: Diversity through disagreement for better transferability. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=K7CbYQbyYhY.
  45. Reducing gender bias in abusive language detection, 2018.
  46. Gradient starvation: A learning proclivity in neural networks. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=aExAsh1UHZo.
  47. Simple and Fast Group Robustness by Automatic Feature Reweighting. International Conference on Machine Learning (ICML), 2023.
  48. The risks of invariant risk minimization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=BbNIbVPJ-42.
  49. Distributionally robust neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ryxGuJrFvS.
  50. An investigation of why overparameterization exacerbates spurious correlations. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
  51. Bitrate-constrained dro: Beyond worst case robustness to unknown group shifts, 2023.
  52. The pitfalls of simplicity bias in neural networks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
  53. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, USA, 2014. ISBN 1107057132.
  54. No subclass left behind: Fine-grained robustness in coarse-grained classification problems. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
  55. Barack: Partially supervised group robustness with guarantees, 2022.
  56. Evading the simplicity bias: Training a diverse set of models discovers solutions with superior ood generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16761–16772, June 2022.
  57. Mind the trade-off: Debiasing NLU models without degrading the in-distribution performance. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8717–8729, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.770. URL https://aclanthology.org/2020.acl-main.770.
  58. Deep learning generalizes because the parameter-function map is biased towards simple functions. International Conference on Learning Representations, 2019.
  59. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
  60. Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. In International Conference on Computer Vision (ICCV), October 2019.
  61. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics, 2018. URL http://aclweb.org/anthology/N18-1101.
  62. David H. Wolpert. The lack of a priori distinctions between learning algorithms. Neural Computation, 8:1341–1390, 1996. URL https://api.semanticscholar.org/CorpusID:207609360.
  63. Noise or signal: The role of image backgrounds in object recognition, 2020.
  64. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Sy8gdB9xx.
  65. Rich feature construction for the optimization-generalization dilemma. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 26397–26411. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/zhang22u.html.
  66. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Bhavya Vasudeva (12 papers)
  2. Kameron Shahabi (2 papers)
  3. Vatsal Sharan (39 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.