Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations (2403.03375v3)

Published 5 Mar 2024 in cs.LG

Abstract: Existing research often posits spurious features as easier to learn than core features in neural network optimization, but the impact of their relative simplicity remains under-explored. Moreover, studies mainly focus on end performance rather than the learning dynamics of feature learning. In this paper, we propose a theoretical framework and an associated synthetic dataset grounded in boolean function analysis. This setup allows for fine-grained control over the relative complexity (compared to core features) and correlation strength (with respect to the label) of spurious features to study the dynamics of feature learning under spurious correlations. Our findings uncover several interesting phenomena: (1) stronger spurious correlations or simpler spurious features slow down the learning rate of the core features, (2) two distinct subnetworks are formed to learn core and spurious features separately, (3) learning phases of spurious and core features are not always separable, (4) spurious features are not forgotten even after core features are fully learned. We demonstrate that our findings justify the success of retraining the last layer to remove spurious correlation and also identifies limitations of popular debiasing algorithms that exploit early learning of spurious features. We support our empirical findings with theoretical analyses for the case of learning XOR features with a one-hidden-layer ReLU network.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Understanding intermediate layers using linear classifier probes, November 2018. arXiv:1610.01644 [cs, stat].
  2. The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks, February 2022. arXiv:2202.08658 [cs, stat].
  3. SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics, August 2023. arXiv:2302.11055 [cs, stat].
  4. Invariant Risk Minimization, March 2020. arXiv:1907.02893 [cs, stat].
  5. Representation Learning: A Review and New Perspectives, April 2014. arXiv:1206.5538 [cs].
  6. Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit, January 2023. arXiv:2207.08799 [cs, math, stat].
  7. A benchmark for toxic comment classification on Civil Comments dataset, January 2023. arXiv:2301.11125 [cs, eess].
  8. Learning Parities with Neural Networks, July 2020. arXiv:2002.07400 [cs, stat].
  9. Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck, September 2023. arXiv:2309.03800 [cs, stat] version: 1.
  10. Shortcut Learning in Deep Neural Networks. Nature Machine Intelligence, 2(11):665–673, November 2020. arXiv:2004.07780 [cs, q-bio].
  11. Margalit Glasgow. SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem, October 2023. arXiv:2309.15111 [cs, stat].
  12. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness, November 2022. arXiv:1811.12231 [cs, q-bio, stat].
  13. What shapes feature representations? Exploring datasets, architectures, and training, October 2020. arXiv:2006.12433 [cs, stat].
  14. Simple data balancing achieves competitive worst-group-accuracy, February 2022. arXiv:2110.14503 [cs].
  15. On Feature Learning in the Presence of Spurious Correlations, October 2022. arXiv:2210.11369 [cs, stat].
  16. Towards Mitigating Spurious Correlations in the Wild: A Benchmark and a more Realistic Dataset, September 2023. arXiv:2306.11957 [cs].
  17. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell, 172(5):1122–1131.e9, February 2018. Publisher: Elsevier.
  18. Learning Debiased Classifier with Biased Committee, June 2022.
  19. Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations, June 2023. arXiv:2204.02937 [cs, stat].
  20. BiaSwap: Removing Dataset Bias with Bias-Tailored Swapping Augmentation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 14972–14981, Montreal, QC, Canada, October 2021. IEEE.
  21. Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images.
  22. Decoupling Representation and Classifier for Long-Tailed Recognition, February 2020. arXiv:1910.09217 [cs].
  23. Just Train Twice: Improving Group Robustness without Training Group Information, September 2021. arXiv:2107.09044 [cs, stat].
  24. Deep Learning Face Attributes in the Wild, September 2015. arXiv:1411.7766 [cs] version: 3.
  25. Towards Last-layer Retraining for Group Robustness with Fewer Annotations, November 2023. arXiv:2309.08534 [cs].
  26. Diversify and Disambiguate: Learning From Underspecified Data, February 2023. arXiv:2202.03418 [cs, stat].
  27. Avoiding spurious correlations via logit correction, February 2023. arXiv:2212.01433 [cs, eess, stat].
  28. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448, Florence, Italy, July 2019. Association for Computational Linguistics.
  29. A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks, March 2023. arXiv:2303.11873 [cs].
  30. Understanding the Failure Modes of Out-of-Distribution Generalization, April 2021. arXiv:2010.15775 [cs, stat].
  31. Learning from Failure: De-biasing Classifier from Biased Classifier. In Advances in Neural Information Processing Systems, volume 33, pages 20673–20684. Curran Associates, Inc., 2020.
  32. SGD on Neural Networks Learns Functions of Increasing Complexity, May 2019. arXiv:1905.11604 [cs, stat].
  33. Ryan O’Donnell. Analysis of Boolean Functions, May 2021. arXiv:2105.10386 [cs, math].
  34. PyTorch: An Imperative Style, High-Performance Deep Learning Library, December 2019. arXiv:1912.01703 [cs, stat].
  35. Gradient Starvation: A Learning Proclivity in Neural Networks, November 2021. arXiv:2011.09468 [cs, math, stat].
  36. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85):2825–2830, 2011.
  37. On the Spectral Bias of Neural Networks, May 2019. arXiv:1806.08734 [cs, stat].
  38. Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization, October 2022. arXiv:2202.06856 [cs].
  39. Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization, April 2020. arXiv:1911.08731 [cs, stat].
  40. An Investigation of Why Overparameterization Exacerbates Spurious Correlations, August 2020. arXiv:2005.04345 [cs, stat].
  41. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 1 edition, May 2014.
  42. The Pitfalls of Simplicity Bias in Neural Networks, October 2020. arXiv:2006.07710 [cs, stat].
  43. Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization, September 2022. arXiv:2105.05612 [cs].
  44. Towards Debiasing NLU Models from Unknown Biases. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7597–7610, Online, November 2020. Association for Computational Linguistics.
  45. Deep learning generalizes because the parameter-function map is biased towards simple functions, April 2019. arXiv:1805.08522 [cs, stat].
  46. Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning, July 2021. arXiv:2105.15134 [cs, stat].
  47. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference, February 2018. arXiv:1704.05426 [cs] version: 4.
  48. Noise or Signal: The Role of Image Backgrounds in Object Recognition, June 2020. arXiv:2006.09994 [cs].
  49. Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression, May 2023. arXiv:2305.16536 [cs, stat].
  50. Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias, May 2023. arXiv:2305.18761 [cs].
  51. Increasing Robustness to Spurious Correlations using Forgettable Examples, February 2021. arXiv:1911.03861 [cs].
  52. Examining and Combating Spurious Features under Distribution Shift, June 2021. arXiv:2106.07171 [cs].
  53. Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations, March 2022. arXiv:2203.01517 [cs].
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. GuanWen Qiu (3 papers)
  2. Da Kuang (10 papers)
  3. Surbhi Goel (44 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.