Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation (2402.18919v3)

Published 29 Feb 2024 in cs.CV and cs.LG

Abstract: While standard Empirical Risk Minimization (ERM) training is proven effective for image classification on in-distribution data, it fails to perform well on out-of-distribution samples. One of the main sources of distribution shift for image classification is the compositional nature of images. Specifically, in addition to the main object or component(s) determining the label, some other image components usually exist, which may lead to the shift of input distribution between train and test environments. More importantly, these components may have spurious correlations with the label. To address this issue, we propose Decompose-and-Compose (DaC), which improves robustness to correlation shift by a compositional approach based on combining elements of images. Based on our observations, models trained with ERM usually highly attend to either the causal components or the components having a high spurious correlation with the label (especially in datapoints on which models have a high confidence). In fact, according to the amount of spurious correlation and the easiness of classification based on the causal or non-causal components, the model usually attends to one of these more (on samples with high confidence). Following this, we first try to identify the causal components of images using class activation maps of models trained with ERM. Afterward, we intervene on images by combining them and retraining the model on the augmented data, including the counterfactual ones. Along with its high interpretability, this work proposes a group-balancing method by intervening on images without requiring group labels or information regarding the spurious features during training. The method has an overall better worst group accuracy compared to previous methods with the same amount of supervision on the group labels in correlation shift.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Systematic generalisation with group invariant predictions. In International Conference on Learning Representations, 2021.
  2. Invariant risk minimization. ArXiv, abs/1907.02893, 2020.
  3. Masktune: Mitigating spurious correlations by forcing to explore. In Advances in Neural Information Processing Systems, 2022.
  4. Recognition in terra incognita. In Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XVI, page 472–489, Berlin, Heidelberg, 2018. Springer-Verlag.
  5. Simple data balancing achieves competitive worst-group-accuracy. In Proceedings of the First Conference on Causal Learning and Reasoning, pages 336–351. PMLR, 2022.
  6. Biaswap: Removing dataset bias with bias-tailored swapping augmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14992–15001, 2021.
  7. Last layer re-training is sufficient for robustness to spurious correlations. ArXiv, abs/2204.02937, 2022.
  8. Out-of-distribution generalization via risk extrapolation (rex). In Proceedings of the 38th International Conference on Machine Learning, pages 5815–5826. PMLR, 2021.
  9. Dropout disagreement: A recipe for group robustness with fewer annotations. In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications, 2022.
  10. Tell me where to look: Guided attention inference network. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9215–9223, 2018.
  11. Just train twice: Improving group robustness without training group information. In Proceedings of the 38th International Conference on Machine Learning, pages 6781–6792. PMLR, 2021.
  12. Decoupled mixup for generalized visual recognition, 2022.
  13. Deep learning face attributes in the wild. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 3730–3738, 2015.
  14. Shortcut learning through the lens of early training dynamics, 2023.
  15. Learning from failure: Training debiased classifier from biased classifier. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2020. Curran Associates Inc.
  16. Agree to disagree: Diversity through disagreement for better transferability. In The Eleventh International Conference on Learning Representations, 2023.
  17. Judea Pearl. Causality. Cambridge University Press, Cambridge, UK, 2 edition, 2009.
  18. Simple and fast group robustness by automatic feature reweighting. ICML 2023.
  19. From fake to real: Pretraining on balanced synthetic images to prevent bias. ArXiv, abs/2308.04553, 2023.
  20. Fair attribute classification through latent space de-biasing. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 9301–9310. Computer Vision Foundation / IEEE, 2021.
  21. Fishr: Invariant gradient variances for out-of-distribution generalization. In Proceedings of the 39th International Conference on Machine Learning, pages 18347–18377. PMLR, 2022.
  22. Distributionally robust neural networks. In International Conference on Learning Representations, 2020.
  23. An investigation of why overparameterization exacerbates spurious correlations. In Proceedings of the 37th International Conference on Machine Learning, pages 8346–8356. PMLR, 2020.
  24. Finding a ”kneedle” in a haystack: Detecting knee points in system behavior. 2011 31st International Conference on Distributed Computing Systems Workshops, pages 166–171, 2011.
  25. Grad-cam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017.
  26. The pitfalls of simplicity bias in neural networks. Advances in Neural Information Processing Systems, 33, 2020.
  27. Unbiased look at dataset bias. In CVPR 2011, pages 1521–1528, 2011.
  28. A closer look at model adaptation using feature distortion and simplicity bias. In The Eleventh International Conference on Learning Representations, 2023.
  29. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
  30. Score-cam: Score-weighted visual explanations for convolutional neural networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 111–119, Los Alamitos, CA, USA, 2020a. IEEE Computer Society.
  31. Causal attention for unbiased visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3091–3100, 2021.
  32. Deep generative model for robust imbalance classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020b.
  33. Discover and cure: Concept-aware mitigation of spurious correlation. arXiv preprint arXiv:2305.00650, 2023.
  34. Masked images are counterfactual samples for robust fine-tuning. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20301–20310, 2023.
  35. Adversarial domain adaptation with domain mixup. Proceedings of the AAAI Conference on Artificial Intelligence, 34:6502–6509, 2020.
  36. Chroma-vae: Mitigating shortcut learning with generative classifiers. In Advances in Neural Information Processing Systems, pages 20351–20365. Curran Associates, Inc., 2022.
  37. Improving out-of-distribution robustness via selective augmentation. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, pages 25407–25437. PMLR, 2022.
  38. Ood-bench: Quantifying and understanding two dimensions of out-of-distribution generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7947–7958, 2022.
  39. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018.
  40. Correct-n-contrast: a contrastive approach for improving robustness to spurious correlations. In Proceedings of the 39th International Conference on Machine Learning, pages 26484–26516. PMLR, 2022.
  41. Learning multi-attention convolutional neural network for fine-grained image recognition. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 5219–5227, 2017.
  42. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
Citations (2)

Summary

Decompose-and-Compose: Enhancing Robustness to Spurious Correlation in Image Classification

Introduction

Empirical Risk Minimization (ERM) has been the cornerstone of training strategies for image classification tasks, particularly when the goal is to optimize performance on in-distribution (ID) samples. However, the effectiveness of ERM wanes in the face of out-of-distribution (OOD) samples, primarily due to its vulnerability to spurious correlations. This limitation is exacerbated in realistic scenarios where images comprise both causal and non-causal components relative to the target label. The paper undertakes the challenge of spurious correlation through a novel approach dubbed Decompose-and-Compose (DaC), which focuses on balancing groups by intervening on non-causal components of images to generate new, counterfactual examples. This method proposes a solution that does not require explicit knowledge of group labels or spurious features, making it a significant step forward in developing robust models capable of generalizing across varied distributions.

Methodology

The foundation of DaC lies in two critical observations:

  • Models trained with ERM tend to focus either on causal parts or on parts that exhibit a high spurious correlation with the target label, especially on samples where the model is confident about its predictions.
  • A granular, compositional analysis, facilitated by class activation maps, indicates that the emphasis (causal or spurious) depends on the relative ease of predicting the label based on these components.

This leads to the inception of DaC, which operates by first identifying and decomposing images into causal and non-causal parts. This identification process leverages the class activation maps of models pre-trained with ERM. The pivotal aspect of DaC is the generation of novel data points through the combination of deconstructed image parts from different samples, followed by retraining the model on this augmented dataset, which incorporates these synthetically generated counterfactual instances.

Evaluation

DaC is evaluated against baseline methods such as DFR, MaskTune, and Group DRO on benchmarks including Waterbirds, CelebA, MetaShift, and Dominoes datasets. These benchmarks are selected to cover a range of distribution and spurious correlation challenges.

The contributions are multi-fold, emphasizing:

  • A detailed analysis of ERM-trained models' behavior, particularly noting their tendency to focus on either causal or spuriously correlated components based on confidence and loss metrics.
  • An innovative method to identify causal parts within images, capitalizing on the insights obtained from the models' attribution maps.
  • A group-balancing strategy reliant on the strategic combination of image parts to construct new, balanced training data points without the necessity for group label information.
  • Superior performance in worst-group accuracy metrics across a majority of the considered benchmarks, showcasing DaC's efficacy in mitigating the adverse effects of spurious correlation.

Implications and Future Directions

The insights and methodology introduced in this paper have profound implications for both theoretical and practical advancements in AI. Theoretically, DaC provides a nuanced understanding of how models attend to various components of an image and how this attention can be manipulated to foster more robust learning. Practically, the ability to enhance model robustness without explicit reliance on group labels or detailed knowledge of spurious features has far-reaching applications across different domains where robustness and generalizability are paramount.

As for future research avenues, the exploration could extend into:

  • Refining the decomposition and composition mechanisms to increase the precision of causal and non-causal part identification.
  • Applying DaC's principles to other forms of data beyond images, such as text or audio, where spurious correlation is also a significant challenge.
  • Investigating the intersection of DaC with other robustness-enhancing techniques like adversarial training, to further bolster model resilience.

Conclusion

This work stands as a testament to the evolving understanding of how models perceive and process information, moving beyond the monolithic treatment of images towards a more dissected and recombinant strategy. Decompose-and-Compose not only sheds light on the nuanced dynamics of spurious correlation but also provides a practical toolkit to address it, signaling a shift towards developing AI systems that are not only performant but robust and fair by design.

X Twitter Logo Streamline Icon: https://streamlinehq.com