Learning Invariant Causal Mechanism from Vision-Language Models (2405.15289v4)
Abstract: Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success, but its performance can degrade when fine-tuned in out-of-distribution (OOD) scenarios. We model the prediction process using a Structural Causal Model (SCM) and show that the causal mechanism involving both invariant and variant factors in training environments differs from that in test environments. In contrast, the causal mechanism with solely invariant factors remains consistent across environments. We theoretically prove the existence of a linear mapping from CLIP embeddings to invariant factors, which can be estimated using interventional data. Additionally, we provide a condition to guarantee low OOD risk of the invariant predictor. Based on these insights, we propose the Invariant Causal Mechanism of CLIP (CLIP-ICM) framework. CLIP-ICM involves collecting interventional data, estimating a linear projection matrix, and making predictions within the invariant subspace. Experiments on several OOD datasets show that CLIP-ICM significantly improves the performance of CLIP. Our method offers a simple but powerful enhancement, boosting the reliability of CLIP in real-world applications.
- Generalization on the Unseen, Logic Reasoning and Degree Curriculum. In Proceedings of the 40th International Conference on Machine Learning, pp. 31–60. PMLR, July 2023. ISSN: 2640-3498.
- Invariant Risk Minimization Games. In Proceedings of the 37th International Conference on Machine Learning, pp. 145–155. PMLR, November 2020. ISSN: 2640-3498.
- Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization, November 2022a. arXiv:2106.06607 [cs, stat].
- Towards efficient representation identification in supervised learning, April 2022b. arXiv:2204.04606 [cs, stat].
- Interventional Causal Representation Learning. In Proceedings of the 40th International Conference on Machine Learning, pp. 372–407. PMLR, July 2023. ISSN: 2640-3498.
- Invariant Risk Minimization, March 2020. arXiv:1907.02893 [cs, stat].
- On Pearl’s Hierarchy and the Foundations of Causal Inference. In Geffner, H., Dechter, R., and Halpern, J. Y. (eds.), Probabilistic and Causal Inference, pp. 507–556. ACM, New York, NY, USA, 1 edition, February 2022. ISBN 978-1-4503-9586-1. doi: 10.1145/3501714.3501743.
- Recognition in terra incognita. In Proceedings of the European conference on computer vision (ECCV), pp. 456–473, 2018.
- Representation Learning: A Review and New Perspectives, April 2014. arXiv:1206.5538 [cs].
- Bishop, C. M. Latent Variable Models. In Jordan, M. I. (ed.), Learning in Graphical Models, pp. 371–403. Springer Netherlands, Dordrecht, 1998. ISBN 978-94-010-6104-9 978-94-011-5014-9. doi: 10.1007/978-94-011-5014-9˙13.
- Learning Linear Causal Representations from Interventions under General Nonlinear Mixing, December 2023. arXiv:2306.02235 [cs, math, stat].
- A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, pp. 1597–1607. PMLR, November 2020. ISSN: 2640-3498.
- Pareto Invariant Risk Minimization: Towards Mitigating the Optimization Dilemma in Out-of-Distribution Generalization. February 2023.
- Identifiability Results for Multimodal Contrastive Learning, March 2023. arXiv:2303.09166 [cs, stat].
- Write a classifier: Zero-shot learning using purely textual descriptions. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2584–2591, 2013.
- Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1657–1664, 2013.
- Devise: A deep visual-semantic embedding model. Advances in neural information processing systems, 26, 2013.
- Domain-adversarial training of neural networks. Journal of machine learning research, 17(59):1–35, 2016.
- CLIP-Adapter: Better Vision-Language Models with Feature Adapters. International Journal of Computer Vision, September 2023. ISSN 0920-5691, 1573-1405. doi: 10.1007/s11263-023-01891-x.
- Causal Inference in Statistics: A Primer. John Wiley & Sons, January 2016. ISBN 978-1-119-18686-1. Google-Books-ID: I0V2CwAAQBAJ.
- A generalization of the Eckart-Young-Mirsky matrix approximation theorem. Linear Algebra and its applications, 88:317–327, 1987. doi: 10.1016/0024-3795(87)90114-5. Publisher: Elsevier.
- In Search of Lost Domain Generalization, July 2020. arXiv:2007.01434 [cs, stat].
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009, 2022.
- Benchmarking Neural Network Robustness to Common Corruptions and Perturbations, March 2019. arXiv:1903.12261 [cs, stat].
- Unsupervised feature extraction by time-contrastive learning and nonlinear ica. Advances in neural information processing systems, 29, 2016.
- Nonlinear ICA of Temporally Dependent Stationary Sources. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp. 460–469. PMLR, April 2017. ISSN: 2640-3498.
- Independent component analysis: algorithms and applications. Neural Networks, 13(4):411–430, June 2000. ISSN 0893-6080. doi: 10.1016/S0893-6080(00)00026-5.
- Nonlinear independent component analysis: Existence and uniqueness results. Neural Networks, 12(3):429–439, April 1999. ISSN 0893-6080. doi: 10.1016/S0893-6080(98)00140-3.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pp. 4904–4916. PMLR, 2021.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Kramer, O. Scikit-Learn. In Machine Learning for Evolution Strategies, volume 20, pp. 45–53. Springer International Publishing, Cham, 2016. ISBN 978-3-319-33381-6 978-3-319-33383-0. doi: 10.1007/978-3-319-33383-0˙5. Series Title: Studies in Big Data.
- A simple weight decay can improve generalization. Advances in neural information processing systems, 4, 1991.
- Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style. November 2021.
- Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA, February 2022. arXiv:2107.10098 [cs, stat].
- Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision, pp. 5542–5550, 2017.
- Domain generalization with adversarial feature learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5400–5409, 2018.
- Subspace identification for multi-source domain adaptation. Advances in Neural Information Processing Systems, 36, 2024.
- Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations, June 2019. arXiv:1811.12359 [cs, stat].
- Weakly-supervised disentanglement without compromises. In International Conference on Machine Learning, pp. 6348–6359. PMLR, 2020.
- Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. In International Conference on Machine Learning, pp. 7721–7735. PMLR, 2021.
- Foundations of machine learning. MIT press, 2018.
- Zero-Shot Learning by Convex Combination of Semantic Embeddings, March 2014. arXiv:1312.5650 [cs].
- Representation Learning with Contrastive Predictive Coding, January 2019. arXiv:1807.03748 [cs, stat].
- Pearl, J. Causality. Cambridge university press, 2009.
- External Validity: From Do-Calculus to Transportability Across Populations. Statistical Science, 29(4):579–595, November 2014. ISSN 0883-4237, 2168-8745. doi: 10.1214/14-STS486. Publisher: Institute of Mathematical Statistics.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1406–1415, 2019.
- Causal inference using invariant prediction: identification and confidence intervals, November 2015. arXiv:1501.01332 [stat].
- Causal Inference by using Invariant Prediction: Identification and Confidence Intervals. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):947–1012, November 2016. ISSN 1369-7412, 1467-9868. doi: 10.1111/rssb.12167.
- Combined scaling for zero-shot transfer learning. Neurocomputing, 555:126658, October 2023. ISSN 09252312. doi: 10.1016/j.neucom.2023.126658.
- Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, pp. 8748–8763. PMLR, July 2021. ISSN: 2640-3498.
- Model-based domain generalization. Advances in Neural Information Processing Systems, 34:20210–20229, 2021.
- On linear identifiability of learned representations. In International Conference on Machine Learning, pp. 9030–9039. PMLR, 2021.
- Distributionally Robust Neural Networks. December 2019.
- On Causal and Anticausal Learning, June 2012. arXiv:1206.6471 [cs, stat].
- Toward Causal Representation Learning. Proceedings of the IEEE, 109(5):612–634, May 2021. ISSN 1558-2256. doi: 10.1109/JPROC.2021.3058954. Conference Name: Proceedings of the IEEE.
- CLIPood: Generalizing CLIP to Out-of-Distributions, July 2023. arXiv:2302.00864 [cs].
- Zero-shot learning through cross-modal transfer. Advances in neural information processing systems, 26, 2013.
- Linear causal disentanglement via interventions. In International Conference on Machine Learning, pp. 32540–32560. PMLR, 2023.
- Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness. In International Conference on Machine Learning, pp. 6056–6065. PMLR, 2019.
- Measuring robustness to natural distribution shifts in image classification. Advances in Neural Information Processing Systems, 33:18583–18599, 2020.
- Unbiased look at dataset bias. In CVPR 2011, pp. 1521–1528, Colorado Springs, CO, USA, June 2011. IEEE. ISBN 978-1-4577-0394-2. doi: 10.1109/CVPR.2011.5995347.
- Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5018–5027, 2017.
- On Calibration and Out-of-Domain Generalization. In Advances in Neural Information Processing Systems, volume 34, pp. 2215–2227. Curran Associates, Inc., 2021.
- Improve unsupervised domain adaptation with mixup training. arXiv preprint arXiv:2001.00677, 2020.
- Adaptive risk minimization: Learning to adapt to domain shift. Advances in Neural Information Processing Systems, 34:23664–23678, 2021.
- Domain Prompt Learning for Efficiently Adapting CLIP to Unseen Domains, August 2022. arXiv:2111.12853 [cs].
- Conditional Prompt Learning for Vision-Language Models, October 2022a. arXiv:2203.05557 [cs].
- Learning to Prompt for Vision-Language Models. International Journal of Computer Vision, 130(9):2337–2348, September 2022b. ISSN 0920-5691, 1573-1405. doi: 10.1007/s11263-022-01653-1. arXiv:2109.01134 [cs].
- Contrastive Learning Inverts the Data Generating Process, April 2022. arXiv:2102.08850 [cs].
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.