Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression (2404.09601v1)
Abstract: Deep Neural Networks are prone to learning and relying on spurious correlations in the training data, which, for high-risk applications, can have fatal consequences. Various approaches to suppress model reliance on harmful features have been proposed that can be applied post-hoc without additional training. Whereas those methods can be applied with efficiency, they also tend to harm model performance by globally shifting the distribution of latent features. To mitigate unintended overcorrection of model behavior, we propose a reactive approach conditioned on model-derived knowledge and eXplainable Artificial Intelligence (XAI) insights. While the reactive approach can be applied to many post-hoc methods, we demonstrate the incorporation of reactivity in particular for P-ClArC (Projective Class Artifact Compensation), introducing a new method called R-ClArC (Reactive Class Artifact Compensation). Through rigorous experiments in controlled settings (FunnyBirds) and with a real-world dataset (ISIC2019), we show that introducing reactivity can minimize the detrimental effect of the applied correction while simultaneously ensuring low reliance on spurious features.
- From attribution maps to human-understandable explanations through Concept Relevance Propagation. Nature Machine Intelligence, 5(9):1006–1019, 2023. Number: 9 Publisher: Nature Publishing Group.
- Software for dataset-wide XAI: from local explanations to global insights with zennit, corelay, and virelay. CoRR, abs/2106.13200, 2021.
- Finding and removing Clever Hans: Using explanation methods to debug and improve deep models. Information Fusion, 77:261–295, 2022.
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7):1–46, 2015.
- Leace: Perfect linear concept erasure in closed form, 2023.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2016.
- Artificial intelligence in medicine: today and tomorrow. Frontiers in medicine, 7:509744, 2020.
- Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pages 168–172, 2018.
- Bcn20000: Dermoscopic lesions in the wild, 2019.
- Support-vector networks. Machine Learning, 20(3):273–297, 1995.
- Editing factual knowledge in language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6491–6506, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics.
- Documenting the english colossal clean crawled corpus. CoRR, abs/2104.08758, 2021.
- Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations, 2023. arXiv:2311.16681 [cs].
- From hope to safety: Unlearning biases of deep models via gradient penalization in latent space. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19):21046–21054, 2024.
- A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation. Advances in Neural Information Processing Systems, 36, 2023.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020. Number: 11 Publisher: Nature Publishing Group.
- Badnets: Identifying vulnerabilities in the machine learning model supply chain, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Funnybirds: A synthetic vision dataset for a part-based analysis of explainable ai methods. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3981–3991, 2023.
- Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). In Proceedings of the 35th International Conference on Machine Learning, pages 2668–2677. PMLR, 2018. ISSN: 2640-3498.
- Towards best practice in explaining neural network decisions with lrp. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2020.
- Probing classifiers are unreliable for concept removal and detection. In Advances in Neural Information Processing Systems, pages 17994–18008. Curran Associates, Inc., 2022.
- Unmasking clever hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019a.
- Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019b. Number: 1 Publisher: Nature Publishing Group.
- A whac-a-mole dilemma: Shortcuts come in multiples where mitigating one amplifies others. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20071–20082, 2023.
- Advances, challenges and opportunities in creating data for trustworthy ai. Nature Machine Intelligence, 4(8):669–677, 2022.
- Preemptively pruning clever-hans strategies in deep neural networks. Information Fusion, 103:102094, 2024.
- Embedding human knowledge into deep neural network via attention map, 2019.
- Spurious features everywhere-large-scale detection of harmful spurious features in imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20235–20246, 2023.
- Reveal to revise: An explainable ai life cycle for iterative bias correction of deep models. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, pages 596–606, Cham, 2023. Springer Nature Switzerland.
- Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence, 2024. arXiv:2202.03482 [cs].
- Null it out: Guarding protected attributes by iterative nullspace projection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7237–7256, Online, 2020. Association for Computational Linguistics.
- Interpretations are useful: penalizing explanations to align neural networks with prior knowledge. In International Conference on Machine Learning, pages 8116–8126. PMLR, 2020.
- Right for the right reasons: training differentiable models by constraining their explanations. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 2662–2670, 2017.
- Stock market prediction using machine learning techniques: a decade survey on methodologies, recent developments, and future directions. Electronics, 10(21):2717, 2021.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
- Editing a classifier by rewriting its prediction rules. In Advances in Neural Information Processing Systems, pages 23359–23373. Curran Associates, Inc., 2021.
- Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nature Machine Intelligence, 2(8):476–486, 2020.
- Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations, 2015.
- EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, pages 6105–6114. PMLR, 2019.
- Explanatory interactive machine learning. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 239–245, 2019.
- Machine learning and criminal justice: A systematic review of advanced methodology for recidivism risk prediction. International journal of environmental research and public health, 19(17):10594, 2022.
- Fast diffusion-based counterfactuals for shortcut removal and generation. arXiv preprint arXiv:2312.14223, 2023.
- Discover and cure: Concept-aware mitigation of spurious correlation. In International Conference on Machine Learning, pages 37765–37786. PMLR, 2023.
- Dilyara Bareeva (6 papers)
- Maximilian Dreyer (15 papers)
- Frederik Pahde (13 papers)
- Wojciech Samek (144 papers)
- Sebastian Lapuschkin (66 papers)