Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Debiasing Multimodal Models via Causal Information Minimization (2311.16941v1)

Published 28 Nov 2023 in cs.LG, cs.AI, cs.CL, cs.CV, and stat.ME

Abstract: Most existing debiasing methods for multimodal models, including causal intervention and inference methods, utilize approximate heuristics to represent the biases, such as shallow features from early stages of training or unimodal features for multimodal tasks like VQA, etc., which may not be accurate. In this paper, we study bias arising from confounders in a causal graph for multimodal data and examine a novel approach that leverages causally-motivated information minimization to learn the confounder representations. Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data. Hence, minimizing the information content of features obtained from a pretrained biased model helps learn the simplest predictive features that capture the underlying data distribution. We treat these features as confounder representations and use them via methods motivated by causal theory to remove bias from models. We find that the learned confounder representations indeed capture dataset biases, and the proposed debiasing methods improve out-of-distribution (OOD) performance on multiple multimodal datasets without sacrificing in-distribution performance. Additionally, we introduce a novel metric to quantify the sufficiency of spurious features in models' predictions that further demonstrates the effectiveness of our proposed methods. Our code is available at: https://github.com/Vaidehi99/CausalInfoMin

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Towards causal vqa: Revealing and reducing spurious correlations by invariant and covariant semantic editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9690–9698.
  2. Analyzing the behavior of visual question answering models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1955–1960, Austin, Texas. Association for Computational Linguistics.
  3. Don’t just assume; look and answer: Overcoming priors for visual question answering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  4. Don’t just assume; look and answer: Overcoming priors for visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4971–4980.
  5. VQA: Visual Question Answering. In International Conference on Computer Vision (ICCV).
  6. Mohammad Taha Bahadori and David Heckerman. 2020. Debiasing concept-based explanations with causal analysis. In International Conference on Learning Representations.
  7. Rubi: Reducing unimodal biases for visual question answering. Advances in Neural Information Processing Systems, 32:841–852.
  8. Counterfactual samples synthesizing for robust visual question answering. In CVPR.
  9. Rethinking data augmentation for robust visual question answering. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXVI, volume 13696 of Lecture Notes in Computer Science, pages 95–112. Springer.
  10. Generative bias for robust visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  11. Somnath Basu Roy Chowdhury and Snigdha Chaturvedi. 2022. Learning fair representations via rate-distortion maximization. Transactions of the Association for Computational Linguistics, 10:1159–1174.
  12. Don’t take the easy way out: Ensemble based methods for avoiding known dataset biases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4069–4082.
  13. Domino: Discovering systematic errors with cross-modal embeddings. In International Conference on Learning Representations.
  14. Large-scale adversarial training for vision-and-language representation learning. Advances in Neural Information Processing Systems, 33:6616–6628.
  15. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673.
  16. Causal inference in statistics: A primer. John Wiley & Sons.
  17. MUTANT: A training paradigm for out-of-distribution generalization in visual question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 878–892, Online. Association for Computational Linguistics.
  18. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6325–6334, Los Alamitos, CA, USA. IEEE Computer Society.
  19. Deconfounded visual grounding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 998–1006.
  20. Drew A Hudson and Christopher D Manning. 2019. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6700–6709.
  21. Revisiting visual question answering baselines. In Computer Vision – ECCV 2016, pages 727–739, Cham. Springer International Publishing.
  22. X-ggm: Graph generative modeling for out-of-distribution generalization in visual question answering. In Proceedings of the 29th ACM International Conference on Multimedia, pages 199–208.
  23. Are all spurious features in natural language alike? an analysis through a causal lens. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9804–9817.
  24. Causal inference with noisy and missing covariates via matrix factorization. Advances in neural information processing systems, 31.
  25. Roses are red, violets are blue… but should vqa expect them to? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2776–2785.
  26. Last layer re-training is sufficient for robustness to spurious correlations. In The Eleventh International Conference on Learning Representations.
  27. Efficient counterfactual debiasing for visual question answering. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3001–3010.
  28. Efficient counterfactual debiasing for visual question answering. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3001–3010.
  29. Efficient counterfactual debiasing for visual question answering. In 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2572–2581.
  30. A closer look at the robustness of vision-and-language pre-trained models. CoRR, abs/2012.08673.
  31. A causal debiasing framework for unsupervised salient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 1610–1619.
  32. Contextual debiasing for visual recognition with causal mechanisms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12755–12765.
  33. Agnieszka Mikołajczyk-Bareła. 2023. Data augmentation and explainability for bias discovery and mitigation in deep learning.
  34. Counterfactual vqa: A cause-effect look at language bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12700–12710.
  35. Eric W Noreen. 1989. Computer-intensive methods for testing hypotheses. Wiley New York.
  36. Causal inference with knowledge distilling and curriculum learning for unbiased vqa. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(3):1–23.
  37. Judea Pearl. 2022. Direct and indirect effects. In Probabilistic and Causal Inference: The Works of Judea Pearl, pages 373–392.
  38. Judea Pearl et al. 2000. Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress, 19(2).
  39. Invariant language modeling. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5728–5743, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  40. Nuisances via negativa: Adjusting for spurious correlations via data augmentation.
  41. Overcoming language priors in visual question answering with adversarial regularization. Advances in Neural Information Processing Systems, 31.
  42. Axel Sauer and Andreas Geiger. 2020. Counterfactual generative networks. In International Conference on Learning Representations.
  43. Taking a hint: Leveraging explanations to make vision and language models more grounded. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2591–2600.
  44. Contextual bandits with latent confounders: An nmf approach. In Artificial Intelligence and Statistics, pages 518–527. PMLR.
  45. Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423.
  46. Ravid Shwartz-Ziv and Naftali Tishby. 2022. Opening the black box of deep neural networks via information. Information Flow in Deep Neural Networks, page 24.
  47. A corpus for reasoning about natural language grounded in photographs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6418–6428.
  48. Hao Tan and Mohit Bansal. 2019. LXMERT: Learning cross-modality encoder representations from transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5100–5111, Hong Kong, China. Association for Computational Linguistics.
  49. Unbiased scene graph generation from biased training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3716–3725.
  50. Onlineaugment: Online data augmentation with less domain knowledge. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 313–329. Springer.
  51. Robert J Tibshirani and Bradley Efron. 1993. An introduction to the bootstrap. Monographs on statistics and applied probability, 57:1–436.
  52. Tyler VanderWeele. 2015. Explanation in causal inference: methods for mediation and interaction. Oxford University Press.
  53. Counterfactual invariance to spurious correlations in text classification. Advances in neural information processing systems, 34:16196–16208.
  54. Debiased visual question answering from feature and sample perspectives. In Advances in Neural Information Processing Systems.
  55. Jialin Wu and Raymond Mooney. 2019. Self-critical reasoning for robust visual question answering. Advances in Neural Information Processing Systems, 32.
  56. Chroma-vae: Mitigating shortcut learning with generative classifiers. Advances in Neural Information Processing Systems, 35:20351–20365.
  57. Visfis: Visual feature importance supervision with right-for-the-right-reason objectives. In Advances in Neural Information Processing Systems.
  58. Yin and yang: Balancing and answering binary visual questions. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5014–5022.
  59. Yin and yang: Balancing and answering binary visual questions. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 5014–5022. IEEE Computer Society.
  60. De-biasing distantly supervised named entity recognition via causal intervention. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4803–4813, Online. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Vaidehi Patil (9 papers)
  2. Adyasha Maharana (13 papers)
  3. Mohit Bansal (304 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com