Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases (2303.05470v3)
Abstract: The problem of spurious correlations (SCs) arises when a classifier relies on non-predictive features that happen to be correlated with the labels in the training data. For example, a classifier may misclassify dog breeds based on the background of dog images. This happens when the backgrounds are correlated with other breeds in the training data, leading to misclassifications during test time. Previous SC benchmark datasets suffer from varying issues, e.g., over-saturation or only containing one-to-one (O2O) SCs, but no many-to-many (M2M) SCs arising between groups of spurious attributes and classes. In this paper, we present \benchmark-{O2O, M2M}-{Easy, Medium, Hard}, an image classification benchmark suite containing spurious correlations between classes and backgrounds. To create this dataset, we employ a text-to-image model to generate photo-realistic images and an image captioning model to filter out unsuitable ones. The resulting dataset is of high quality and contains approximately 152k images. Our experimental results demonstrate that state-of-the-art group robustness methods struggle with \benchmark, most notably on the Hard-splits with none of them getting over $70\%$ accuracy on the hardest split using a ResNet50 pretrained on ImageNet. By examining model misclassifications, we detect reliances on spurious backgrounds, demonstrating that our dataset provides a significant challenge.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Throwing away data improves worst-class error in imbalanced classification. arXiv preprint arXiv:2205.11672, 2022.
- Multi-stage prediction networks for data harmonization, 2019. URL https://arxiv.org/abs/1907.11629.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Swad: Domain generalization by seeking flat minima. Advances in Neural Information Processing Systems, 34:22405–22418, 2021.
- Double/debiased machine learning for treatment and structural parameters, 2018.
- Invariant causal mechanisms through distribution matching. arXiv preprint arXiv:2206.11646, 2022.
- Functional map of the world, 2017. URL https://arxiv.org/abs/1711.07846.
- Task-robust model-agnostic meta-learning. Advances in Neural Information Processing Systems, 33:18860–18871, 2020.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Domain generalization via model-agnostic learning of semantic features. Advances in Neural Information Processing Systems, 32, 2019.
- Underspecification presents challenges for credibility in modern machine learning. Journal of Machine Learning Research, 2020.
- Unsupervised domain adaptation by backpropagation. In International conference on machine learning, pp. 1180–1189. PMLR, 2015.
- Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018a.
- Generalisation in humans and deep neural networks. Advances in neural information processing systems, 31, 2018b.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
- Are vision transformers robust to spurious correlations?, 2022. URL https://arxiv.org/abs/2203.09125.
- Improving robustness using generated data. Advances in Neural Information Processing Systems, 34:4218–4233, 2021.
- In search of lost domain generalization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=lQdXeXDoWtI.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Benchmarking neural network robustness to common corruptions and perturbations, 2019. URL https://arxiv.org/abs/1903.12261.
- The origins and prevalence of texture bias in convolutional neural networks. Advances in Neural Information Processing Systems, 33:19000–19015, 2020.
- Howard, J. Imagenette: A smaller subset of 10 easily classified classes from imagenet, March 2019a. URL https://github.com/fastai/imagenette.
- Howard, J. Imagewoof: a subset of 10 classes from imagenet that aren’t so easy to classify, March 2019b. URL https://github.com/fastai/imagenette#imagewoof.
- Simple data balancing achieves competitive worst-group-accuracy. In Conference on Causal Learning and Reasoning, pp. 336–351. PMLR, 2022.
- On feature learning in the presence of spurious correlations. arXiv preprint arXiv:2210.11369, 2022.
- Invariant meta learning for out-of-distribution generalization. arXiv preprint arXiv:2301.11779, 2023.
- Kaddour, J. Stop wasting my time! saving days of imagenet and BERT training with latest weight averaging. In Has it Trained Yet? NeurIPS 2022 Workshop, 2022. URL https://openreview.net/forum?id=0OrABUHZuz.
- Probabilistic Active Meta-Learning. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 20813–20822. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/ef0d17b3bdb4ee2aa741ba28c7255c53-Paper.pdf.
- Causal effect inference for structured treatments. Advances in Neural Information Processing Systems, 34:24841–24854, 2021.
- When do flat minima optimizers work? In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022a. URL https://openreview.net/forum?id=vDeh2yxTvuh.
- Causal machine learning: A survey and open problems. arXiv preprint arXiv:2206.15475, 2022b. URL https://arxiv.org/abs/2206.15475.
- Does invariant risk minimization capture invariance? In International Conference on Artificial Intelligence and Statistics, pp. 4069–4077. PMLR, 2021.
- Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937, 2022.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pp. 5637–5664. PMLR, 2021.
- Out-of-distribution generalization via risk extrapolation (rex). In International Conference on Machine Learning, pp. 5815–5826. PMLR, 2021.
- Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116(10):4156–4165, 2019.
- Dropout disagreement: A recipe for group robustness with fewer annotations. In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications, 2022.
- Surgical Fine-Tuning Improves Adaptation to Distribution Shifts, March 2023. URL http://arxiv.org/abs/2210.11466. arXiv:2210.11466 [cs].
- Episodic training for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1446–1455, 2019.
- Domain generalization with adversarial feature learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5400–5409, 2018.
- Metashift: A dataset of datasets for evaluating contextual distribution shifts and training conflicts. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=MTex8qKavoS.
- Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning, pp. 6781–6792. PMLR, 2021.
- Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pp. 3730–3738, 2015.
- Unsupervised domain adaptation with residual transfer networks. Advances in neural information processing systems, 29, 2016.
- Evaluating the impact of geometric and statistical skews on out-of-distribution generalization performance. In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications, 2022. URL https://openreview.net/forum?id=wpT79coXAu.
- Domain generalization using causal matching. In International Conference on Machine Learning, pp. 7313–7324. PMLR, 2021.
- Causal transportability for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7521–7531, 2022.
- You only need a good embeddings extractor to fix spurious correlations. arXiv preprint arXiv:2212.06254, 2022.
- Murphy, K. P. Probabilistic Machine Learning: An introduction. MIT Press, 2022. URL probml.ai.
- Understanding the failure modes of out-of-distribution generalization, 2020. URL https://arxiv.org/abs/2010.15775.
- Spurious features everywhere – large-scale detection of harmful spurious features in imagenet, 2022. URL https://arxiv.org/abs/2212.04871.
- Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299–319, 2021.
- NLP Connect. vit-gpt2-image-captioning (revision 0e334c7), 2022. URL https://huggingface.co/nlpconnect/vit-gpt2-image-captioning.
- Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
- Fishr: Invariant gradient variances for out-of-distribution generalization. In International Conference on Machine Learning, pp. 18347–18377. PMLR, 2022a.
- Diverse weight averaging for out-of-distribution generalization. arXiv preprint arXiv:2205.09739, 2022b.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- The risks of invariant risk minimization. arXiv preprint arXiv:2010.05761, 2020.
- Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization, 2019. URL https://arxiv.org/abs/1911.08731.
- Laion-5b: An open large-scale dataset for training next generation image-text models, 2022. URL https://arxiv.org/abs/2210.08402.
- Salient imagenet: How to discover spurious features in deep learning? arXiv preprint arXiv:2110.04301, 2021.
- Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, pp. 443–450. Springer, 2016.
- Evading the simplicity bias: Training a diverse set of models discovers solutions with superior ood generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16761–16772, 2022.
- Vapnik, V. Principles of risk minimization for learning theory. Advances in neural information processing systems, 4, 1991a.
- Vapnik, V. Principles of risk minimization for learning theory. Advances in neural information processing systems, 4, 1991b.
- Dataset interfaces: Diagnosing model failures using controllable counterfactual generation. arXiv preprint arXiv:2302.07865, 2023.
- Meta learning on a sequence of imbalanced domains with difficulty awareness. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8947–8957, 2021.
- A fine-grained analysis on distribution shift. arXiv preprint arXiv:2110.11328, 2021.
- Discovering bugs in vision models using off-the-shelf image generation and captioning. arXiv preprint arXiv:2208.08831, 2022.
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pp. 23965–23998. PMLR, 2022.
- Noise or signal: The role of image backgrounds in object recognition. arXiv preprint arXiv:2006.09994, 2020.
- Cdtrans: Cross-domain transformer for unsupervised domain adaptation. arXiv preprint arXiv:2109.06165, 2021.
- Improving Out-of-Distribution Robustness via Selective Augmentation. In Proceedings of the 39th International Conference on Machine Learning, pp. 25407–25437. PMLR, June 2022. URL https://proceedings.mlr.press/v162/yao22b.html. ISSN: 2640-3498.
- Ttida: Controllable generative data augmentation via text-to-text and text-to-image models, 2023.
- Adaptive risk minimization: A meta-learning approach for tackling group shift. arXiv preprint arXiv:2007.02931, 8:9, 2020.
- Nico challenge: Out-of-distribution generalization for image recognition challenges. In European Conference on Computer Vision, pp. 433–450. Springer, 2023.