Identifying Spurious Correlations using Counterfactual Alignment (2312.02186v3)
Abstract: Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual (CF) alignment method to detect and quantify spurious correlations of black box classifiers. Our methodology is based on counterfactual images generated with respect to one classifier being input into other classifiers to see if they also induce changes in the outputs of these classifiers. The relationship between these responses can be quantified and used to identify specific instances where a spurious correlation exists. This is validated by observing intuitive trends in face-attribute and waterbird classifiers, as well as by fabricating spurious correlations and detecting their presence, both visually and quantitatively. Furthermore, utilizing the CF alignment method, we demonstrate that we can evaluate robust optimization methods (GroupDRO, JTT, and FLAC) by detecting a reduction in spurious correlations.
- Understanding intermediate layers using linear classifier probes. International Conference on Learning Representations, 2016.
- Towards Causal Benchmarking of Bias in Face Analysis Algorithms. In Computer Vision and Pattern Recognition, 2021.
- Latent-CF: A Simple Baseline for Reverse Counterfactual Explanations. In Neural Information Processing Systems (NeurIPS) Fair AI in Finance Workshop, 2020.
- Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples. In International Joint Conference on Neural Networks, 2020.
- Visual Feature Attribution Using Wasserstein GANs. In Computer Vision and Pattern Recognition, 2018.
- Recognition in Terra Incognita. In European Conference on Computer Vision, 2018.
- RoentGen: Vision-Language Foundation Model for Chest X-ray Generation, 2022.
- Gifsplanation via Latent Shift: A Simple Autoencoder Approach to Counterfactual Generation for Chest X-rays. Medical Imaging with Deep Learning, 2021.
- Taming transformers for high-resolution image synthesis. In Computer Vision and Pattern Recognition, 2021.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2020.
- Feature Interpretation Using Generative Adversarial Networks (FIGAN): A Framework for Visualizing a CNN’s Learned Features. IEEE Access, 2023.
- xGEMs: Generating Examplars to Explain Black-Box Models, 2018.
- A Style-Based Generator Architecture for Generative Adversarial Networks. In Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Interpretability beyond feature attribution: Quantitative Testing with Concept Activation Vectors (TCAV). In International Conference on Machine Learning, 2018.
- Captum: A unified and generic model interpretability library for PyTorch, 2020.
- Out-of-Distribution Generalization via Risk Extrapolation. In International Conference of Machine Learning, 2021.
- Deep learning face attributes in the wild. In International Conference on Computer Vision, 2015.
- Conditional Generative Adversarial Nets, 2014.
- A unifying view on dataset shift in classification. Pattern Recognition, 2012.
- Feature Visualization. Distill, 2017.
- Zoom In: An Introduction to Circuits. Distill, 2020.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. Neural Information Processing Systems, 2019.
- ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. 2016.
- Interpretations are useful: Penalizing explanations to align neural networks with prior knowledge. In International Conference on Machine Learning, 2020.
- Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. In International Joint Conference on Artificial Intelligence, 2017.
- Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization. International Conference on Learning Representations, 2020.
- ExplainGAN: Model Explanation via Decision Boundary Crossing Transformations. In European Conference on Computer Vision, 2018.
- Using StyleGAN for Visual Interpretability of Deep Learning Models on Medical Images. In Medical Imaging meets NeurIPS, 2020.
- Chest radiographs in congestive heart failure: Visualizing neural network learning. Radiology, 2019.
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. In International Conference on Learning Representations (ICLR), 2014.
- Explaining the black-box smoothly—A counterfactual approach. Medical Image Analysis, 2023.
- Striving for Simplicity: The All Convolutional Net. In International Conference on Learning Representations Workshop, 2015.
- Axiomatic Attribution for Deep Networks. In International Conference on Machine Learning, 2017.
- Branched Multi-Task Networks: Deciding What Layers To Share. British Machine Vision Conference, 2020.
- Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review. In Neural Information Processing Systems (NeurIPS) Retrospectives Workshop, 2020.
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, 2022.
- Fairness-aware training of face attribute classifiers via adversarial robustness. Knowledge-Based Systems, 2023.