Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 65 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 80 tok/s Pro
Kimi K2 182 tok/s Pro
GPT OSS 120B 453 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Unraveling the Key Components of OOD Generalization via Diversification (2312.16313v3)

Published 26 Dec 2023 in cs.LG

Abstract: Supervised learning datasets may contain multiple cues that explain the training set equally well, i.e., learning any of them would lead to the correct predictions on the training data. However, many of them can be spurious, i.e., lose their predictive power under a distribution shift and consequently fail to generalize to out-of-distribution (OOD) data. Recently developed "diversification" methods (Lee et al., 2023; Pagliardini et al., 2023) approach this problem by finding multiple diverse hypotheses that rely on different features. This paper aims to study this class of methods and identify the key components contributing to their OOD generalization abilities. We show that (1) diversification methods are highly sensitive to the distribution of the unlabeled data used for diversification and can underperform significantly when away from a method-specific sweet spot. (2) Diversification alone is insufficient for OOD generalization. The choice of the used learning algorithm, e.g., the model's architecture and pretraining, is crucial. In standard experiments (classification on Waterbirds and Office-Home datasets), using the second-best choice leads to an up to 20\% absolute drop in accuracy. (3) The optimal choice of learning algorithm depends on the unlabeled data and vice versa i.e. they are co-dependent. (4) Finally, we show that, in practice, the above pitfalls cannot be alleviated by increasing the number of diverse hypotheses, the major feature of diversification methods. These findings provide a clearer understanding of the critical design factors influencing the OOD generalization abilities of diversification methods. They can guide practitioners in how to use the existing methods best and guide researchers in developing new, better ones.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/62dad6e273d32235ae02b7d321578ee8-Paper.pdf.
  2. Task Discovery: Finding the Tasks that Neural Networks Generalize on. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  15702–15717. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/64ad7b36b497f375ded2e6f15713ed4c-Paper-Conference.pdf.
  3. Agreement-on-the-line: Predicting the Performance of Neural Networks under Distribution Shift. October 2022. URL https://openreview.net/forum?id=EZZsnke1kt.
  4. Relational inductive biases, deep learning, and graph networks, October 2018. URL http://arxiv.org/abs/1806.01261. arXiv:1806.01261 [cs, stat].
  5. Recognition in Terra Incognita, July 2018. URL http://arxiv.org/abs/1807.04975. arXiv:1807.04975 [cs, q-bio].
  6. A note on a result in the theory of code construction. Information and Control, 2(2):183–194, June 1959. ISSN 0019-9958. doi: 10.1016/S0019-9958(59)90376-6. URL https://www.sciencedirect.com/science/article/pii/S0019995859903766.
  7. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  9912–9924. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/70feb62b69f16e0238f741fab228fec2-Paper.pdf.
  8. Emerging Properties in Self-Supervised Vision Transformers, May 2021. URL http://arxiv.org/abs/2104.14294. arXiv:2104.14294 [cs].
  9. Big Self-Supervised Models are Strong Semi-Supervised Learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  22243–22255. Curran Associates, Inc., 2020a. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/fcbc95ccdd551da181207c0c1400c655-Paper.pdf.
  10. Improved Baselines with Momentum Contrastive Learning, March 2020b. URL http://arxiv.org/abs/2003.04297. arXiv:2003.04297 [cs].
  11. Environment Inference for Invariant Learning. In Proceedings of the 38th International Conference on Machine Learning, pp.  2189–2200. PMLR, July 2021. URL https://proceedings.mlr.press/v139/creager21a.html. ISSN: 2640-3498.
  12. Underspecification Presents Challenges for Credibility in Modern Machine Learning, November 2020. URL http://arxiv.org/abs/2011.03395. arXiv:2011.03395 [cs, stat].
  13. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, June 2021. URL http://arxiv.org/abs/2010.11929. arXiv:2010.11929 [cs].
  14. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, November 2020. ISSN 2522-5839. doi: 10.1038/s42256-020-00257-z. URL https://www.nature.com/articles/s42256-020-00257-z. Number: 11 Publisher: Nature Publishing Group.
  15. Implicit Bias of Gradient Descent on Linear Convolutional Networks. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/hash/0e98aeeb54acf612b9eb4e48a269814c-Abstract.html.
  16. Let’s Agree to Agree: Neural Networks Share Classification Order on Real Datasets. In Proceedings of the 37th International Conference on Machine Learning, pp.  3950–3960. PMLR, November 2020. URL https://proceedings.mlr.press/v119/hacohen20a.html. ISSN: 2640-3498.
  17. Deep Residual Learning for Image Recognition, December 2015. URL http://arxiv.org/abs/1512.03385. arXiv:1512.03385 [cs].
  18. Masked Autoencoders Are Scalable Vision Learners, December 2021. URL http://arxiv.org/abs/2111.06377. arXiv:2111.06377 [cs] version: 2.
  19. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, January 1989. ISSN 0893-6080. doi: 10.1016/0893-6080(89)90020-8. URL https://www.sciencedirect.com/science/article/pii/0893608089900208.
  20. Does Distributionally Robust Supervised Learning Give Robust Classifiers? In Proceedings of the 35th International Conference on Machine Learning, pp.  2029–2037. PMLR, July 2018. URL https://proceedings.mlr.press/v80/hu18a.html. ISSN: 2640-3498.
  21. Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  2261–2269, July 2017. doi: 10.1109/CVPR.2017.243. ISSN: 1063-6919.
  22. The Low-Rank Simplicity Bias in Deep Networks, March 2023. URL http://arxiv.org/abs/2103.10427. arXiv:2103.10427 [cs].
  23. Inductive Bias. In Werner Dubitzky, Olaf Wolkenhauer, Kwang-Hyun Cho, and Hiroki Yokota (eds.), Encyclopedia of Systems Biology, pp.  1018–1018. Springer, New York, NY, 2013. ISBN 978-1-4419-9863-7. doi: 10.1007/978-1-4419-9863-7_927. URL https://doi.org/10.1007/978-1-4419-9863-7_927.
  24. Probing as Quantifying Inductive Bias. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  1839–1851, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.129. URL https://aclanthology.org/2022.acl-long.129.
  25. Directional convergence and alignment in deep learning. In Advances in Neural Information Processing Systems, volume 33, pp.  17176–17186. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/c76e4b2fa54f8506719a5c0dc14c2eb9-Abstract.html.
  26. Assessing Generalization of SGD via Disagreement, May 2022. URL http://arxiv.org/abs/2106.13799. arXiv:2106.13799 [cs, stat].
  27. SGD on Neural Networks Learns Functions of Increasing Complexity. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/hash/b432f34c5a997c8e7c806a895ecc5e25-Abstract.html.
  28. WILDS: A Benchmark of in-the-Wild Distribution Shifts, July 2021. URL http://arxiv.org/abs/2012.07421. arXiv:2012.07421 [cs].
  29. Learning multiple layers of features from tiny images. 2009. Publisher: Toronto, ON, Canada.
  30. Dropout Disagreement: A Recipe for Group Robustness with Fewer Annotations. November 2022. URL https://openreview.net/forum?id=3OxII8ZB3A.
  31. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998. ISSN 1558-2256. doi: 10.1109/5.726791. Conference Name: Proceedings of the IEEE.
  32. Diversify and Disambiguate: Out-of-Distribution Robustness via Disagreement. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=RVTOp3MwT3n.
  33. MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts. January 2022. URL https://openreview.net/forum?id=MTex8qKavoS.
  34. Just Train Twice: Improving Group Robustness without Training Group Information. In Proceedings of the 38th International Conference on Machine Learning, pp.  6781–6792. PMLR, July 2021. URL https://proceedings.mlr.press/v139/liu21f.html. ISSN: 2640-3498.
  35. Bad Global Minima Exist and SGD Can Reach Them. In Advances in Neural Information Processing Systems, volume 33, pp.  8543–8552. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/618491e20a9b686b79e158c293ab4f91-Abstract.html.
  36. Deep Learning Face Attributes in the Wild. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV ’15, pp.  3730–3738, USA, December 2015. IEEE Computer Society. ISBN 978-1-4673-8391-2. doi: 10.1109/ICCV.2015.425. URL https://doi.org/10.1109/ICCV.2015.425.
  37. Predicting Inductive Biases of Pre-Trained Models. March 2021. URL https://openreview.net/forum?id=mNtmhaDkAr.
  38. Intriguing Properties of Vision Transformers. In Advances in Neural Information Processing Systems, volume 34, pp.  23296–23308. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/c404a5adbf90e09631678b13b05d9d7a-Abstract.html.
  39. Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging. Proceedings of the ACM Conference on Health, Inference, and Learning, 2020:151–159, April 2020. doi: 10.1145/3368555.3384468.
  40. Agree to Disagree: Diversity through Disagreement for Better Transferability. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=K7CbYQbyYhY.
  41. Atri Rudra. Lecture 16: Plotkin Bound. October 2007. URL https://cse.buffalo.edu/faculty/atri/courses/coding-theory/lectures/lect16.pdf.
  42. ImageNet Large Scale Visual Recognition Challenge, January 2015. URL http://arxiv.org/abs/1409.0575. arXiv:1409.0575 [cs].
  43. Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization, April 2020. URL http://arxiv.org/abs/1911.08731. arXiv:1911.08731 [cs, stat].
  44. Do Adversarially Robust ImageNet Models Transfer Better? In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  3533–3545. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/24357dd085d2c4b1a88a7e0692e60294-Paper.pdf.
  45. Which Shortcut Cues Will DNNs Choose? A Study from the Parameter-Space Perspective, February 2022. URL http://arxiv.org/abs/2110.03095. arXiv:2110.03095 [cs, stat].
  46. The Pitfalls of Simplicity Bias in Neural Networks. In Advances in Neural Information Processing Systems, volume 33, pp.  9573–9585. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/6cfe0e6127fa25df2a0ef2ae1067d915-Abstract.html.
  47. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge, 2014. ISBN 978-1-107-05713-5. doi: 10.1017/CBO9781107298019. URL https://www.cambridge.org/core/books/understanding-machine-learning/3059695661405D25673058E43C8BE2A6.
  48. No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems. In Advances in Neural Information Processing Systems, volume 33, pp.  19339–19352. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/e0688d13958a19e087e123148555e4b4-Abstract.html.
  49. S. A. Stepanov. Nonlinear codes from modified Butson–Hadamard matrices. 16(5):429–438, September 2006. ISSN 1569-3929. doi: 10.1515/156939206779238463. URL https://www.degruyter.com/document/doi/10.1515/156939206779238463/html. Publisher: De Gruyter Section: Discrete Mathematics and Applications.
  50. S. A. Stepanov. Nonlinear q-ary codes with large code distance. Problems of Information Transmission, 53(3):242–250, July 2017. ISSN 1608-3253. doi: 10.1134/S003294601703005X. URL https://doi.org/10.1134/S003294601703005X.
  51. Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization, September 2022a. URL http://arxiv.org/abs/2105.05612. arXiv:2105.05612 [cs].
  52. Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning, July 2022b. URL http://arxiv.org/abs/2207.02598. arXiv:2207.02598 [cs] version: 1.
  53. V. Vapnik. Principles of risk minimization for learning theory. In Proceedings of the 4th International Conference on Neural Information Processing Systems, NIPS’91, pp.  831–838, San Francisco, CA, USA, December 1991. Morgan Kaufmann Publishers Inc. ISBN 978-1-55860-222-9.
  54. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. In Vladimir Vovk, Harris Papadopoulos, and Alexander Gammerman (eds.), Measures of Complexity: Festschrift for Alexey Chervonenkis, pp.  11–30. Springer International Publishing, Cham, 2015. ISBN 978-3-319-21852-6. doi: 10.1007/978-3-319-21852-6_3. URL https://doi.org/10.1007/978-3-319-21852-6_3.
  55. Deep Hashing Network for Unsupervised Domain Adaptation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  5385–5394, July 2017. doi: 10.1109/CVPR.2017.572. ISSN: 1063-6919.
  56. Assaying Out-Of-Distribution Generalization in Transfer Learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  7181–7198. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/2f5acc925919209370a3af4eac5cad4a-Paper-Conference.pdf.
  57. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, September 2017. URL http://arxiv.org/abs/1708.07747. arXiv:1708.07747 [cs, stat].
  58. ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. In Advances in Neural Information Processing Systems, volume 34, pp.  28522–28535. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/efb76cff97aaf057654ef2f38cd77d73-Abstract.html.
  59. Understanding deep learning requires rethinking generalization, February 2017. URL http://arxiv.org/abs/1611.03530. arXiv:1611.03530 [cs].
  60. Coping with Label Shift via Distributionally Robust Optimisation. January 2021. URL https://openreview.net/forum?id=BtZhsSGNRNi.
  61. Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations. In Proceedings of the 39th International Conference on Machine Learning, pp.  26484–26516. PMLR, June 2022. URL https://proceedings.mlr.press/v162/zhang22z.html. ISSN: 2640-3498.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube