Counterfactual contrastive learning: robust representations via causal image synthesis (2403.09605v2)
Abstract: Contrastive pretraining is well-known to improve downstream task performance and model generalisation, especially in limited label settings. However, it is sensitive to the choice of augmentation pipeline. Positive pairs should preserve semantic information while destroying domain-specific information. Standard augmentation pipelines emulate domain-specific changes with pre-defined photometric transformations, but what if we could simulate realistic domain changes instead? In this work, we show how to utilise recent progress in counterfactual image generation to this effect. We propose CF-SimCLR, a counterfactual contrastive learning approach which leverages approximate counterfactual inference for positive pair creation. Comprehensive evaluation across five datasets, on chest radiography and mammography, demonstrates that CF-SimCLR substantially improves robustness to acquisition shift with higher downstream performance on both in- and out-of-distribution data, particularly for domains which are under-represented during training.
- Chexplaining in style: Counterfactual explanations for chest x-rays using stylegan. arXiv preprint arXiv:2207.07553, 2022.
- Diffusion visual counterfactual explanations. Advances in Neural Information Processing Systems, 35:364–377, 2022.
- Big self-supervised models advance medical image classification. In Proceedings of IEEE/CVF ICCV, pp. 3478–3488, 2021.
- Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nature Biomedical Engineering, pp. 1–24, 2023.
- Padchest: A large chest x-ray image dataset with multi-label annotated reports. Medical image analysis, 66:101797, 2020.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
- Evaluating and mitigating bias in image classifiers: A causal perspective using counterfactuals. In Proceedings of IEEE/CVF WCAV, pp. 915–924, 2022.
- Diffusion models for counterfactual generation and anomaly detection in brain images. arXiv preprint arXiv:2308.02062, 2023.
- High-resolution synthesis of high-density breast mammograms: Application to improved fairness in deep learning based mass detection. Frontiers in Oncology, 12:1044496, 2023.
- Self-supervised learning from 100 million medical images. arXiv preprint arXiv:2201.01283, 2022.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of CVPR, pp. 9729–9738, 2020.
- Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pp. 590–597, 2019.
- The emory breast imaging dataset (embed): A racially diverse, granular dataset of 3.4 million screening and diagnostic mammographic images. Radiology: Artificial Intelligence, 5(1):e220047, 2023.
- Counterfactual explanation of brain activity classifiers using image-to-image transfer by generative adversarial network. Frontiers in Neuroinformatics, 15:802938, 2022.
- Measuring axiomatic soundness of counterfactual image models. arXiv preprint arXiv:2303.01274, 2023.
- Vindr-mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography. medRxiv, 2022a. doi: 10.1101/2022.03.07.22272009. URL https://www.medrxiv.org/content/early/2022/03/10/2022.03.07.22272009.
- Vindr-mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography. medRxiv, 2022b. doi: 10.1101/2022.03.07.22272009. URL https://www.medrxiv.org/content/early/2022/03/10/2022.03.07.22272009.
- Deep structural causal models for tractable counterfactual inference. Advances in Neural Information Processing Systems, 33:857–869, 2020.
- Judea Pearl. Causality. Cambridge university press, 2009.
- High fidelity image counterfactuals with probabilistic causal models. In International Conference on Machine Learning, 2023.
- What is healthy? generative counterfactual diffusion for lesion localization. In MICCAI Workshop on Deep Generative Models, pp. 34–44. Springer, 2022.
- Improving domain-invariance in self-supervised learning via batch styles standardization. arXiv preprint arXiv:2303.06088, 2023.
- Rsna pneumonia detection challenge, 2018. URL https://kaggle.com/competitions/rsna-pneumonia-detection-challenge.
- Inherently interpretable multi-label classification using class-specific counterfactuals. In Medical Imaging with Deep Learning, 2023.
- What makes for good views for contrastive learning? Advances in neural information processing systems, 33:6827–6839, 2020.
- Adversarial counterfactual augmentation: application in alzheimer’s disease classification. Frontiers in Radiology, 2:1039160, 2022.
- Generating counterfactual hard negative samples for graph contrastive learning. In Proceedings of the ACM Web Conference 2023, pp. 621–629, 2023.
- Counterfactual contrastive learning for weakly-supervised vision-language grounding. Advances in Neural Information Processing Systems, 33:18123–18134, 2020.
- A foundation model for generalizable disease detection from retinal images. Nature, pp. 1–8, 2023.
- Fabio De Sousa Ribeiro (20 papers)
- Tian Xia (66 papers)
- Galvin Khara (2 papers)
- Ben Glocker (143 papers)
- Melanie Roschewitz (4 papers)