Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes (2308.11585v2)

Published 19 Aug 2023 in cs.AI and cs.CL

Abstract: Amidst the rapid expansion of Machine Learning (ML) and LLMs, understanding the semantics within their mechanisms is vital. Causal analyses define semantics, while gradient-based methods are essential to eXplainable AI (XAI), interpreting the model's 'black box'. Integrating these, we investigate how a model's mechanisms reveal its causal effect on evidence-based decision-making. Research indicates intersectionality - the combined impact of an individual's demographics - can be framed as an Average Treatment Effect (ATE). This paper demonstrates that hateful meme detection can be viewed as an ATE estimation using intersectionality principles, and summarized gradient-based attention scores highlight distinct behaviors of three Transformer models. We further reveal that LLM Llama-2 can discern the intersectional aspects of the detection through in-context learning and that the learning process could be explained via meta-gradient, a secondary form of gradient. In conclusion, this work furthers the dialogue on Causality and XAI. Our code is available online (see External Resources section).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2623–2631. Anchorage AK USA: ACM. ISBN 978-1-4503-6201-6.
  2. Understanding Intermediate Layers Using Linear Classifier Probes. In 5th International Conference on Learning Representations. Toulon, France.
  3. Combining Sodium MRI , Proton MR Spectroscopic Imaging, and Intracerebral EEG in Epilepsy. Human Brain Mapping, 44(2): 825–840.
  4. From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data. arXiv preprint.
  5. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. Information Fusion, 58: 82–115.
  6. Causally Interpreting Intersectionality Theory. Philosophy of Science, 83(1): 60–81.
  7. Language Models Are Few-Shot Learners. In Advances in Neural Information Processing Systems, volume 33, 1877–1901. Curran Associates, Inc.
  8. Disentangling Hate in Online Memes. Proceedings of the 29th ACM International Conference on Multimedia, 5138–5147.
  9. An Explainable Multi-Modal Hierarchical Attention Model for Developing Phishing Threat Intelligence. IEEE Transactions on Dependable and Secure Computing, 1–1.
  10. Nipping in the Bud: Detection, Diffusion and Mitigation of Hate Speech on Social Media. ACM SIGWEB Newsletter, 2022(Winter): 1931–1745.
  11. Chauhan, D. S. 2020. All-in-One: A Deep Attentive Multi-task Learning Framework for Humour, Sarcasm, Offensive, Motivation, and Sentiment on Memes. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 281–290. Suzhou, China: Association for Computational Linguistics.
  12. Meta-Learning via Language Model In-context Tuning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 719–730. Dublin, Ireland: Association for Computational Linguistics.
  13. UNITER: UNiversal Image-TExt Representation Learning. In Vedaldi, A.; Bischof, H.; Brox, T.; and Frahm, J.-M., eds., Computer Vision – ECCV 2020, volume 12375, 104–120. Cham: Springer International Publishing. ISBN 978-3-030-58576-1 978-3-030-58577-8.
  14. Scaling Instruction-Finetuned Language Models. arXiv preprint.
  15. Meta-in-Context Learning in Large Language Models. arXiv preprint.
  16. Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers. In Findings of the Association for Computational Linguistics, 4005–4019. Association for Computational Linguistics.
  17. Detecting Hate Speech in Multi-modal Memes. arXiv preprint.
  18. HateMM: A Multi-Modal Dataset for Hate Video Classification. Proceedings of the International AAAI Conference on Web and Social Media, 17: 1014–1023.
  19. Automated Hate Speech Detection and the Problem of Offensive Language. In Proceedings of the 11th International AAAI Conference on Web and Social Media, 512–515. Montreal, Canada.
  20. Hate Speech Dataset from a White Supremacy Forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), 11–20. Brussels, Belgium: Association for Computational Linguistics.
  21. An Interpretable Approach to Hateful Meme Detection. In ICMI ’21: Proceedings of the 2021 International Conference on Multimodal Interaction, 723–727.
  22. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT, 4171–4186.
  23. Detecting Propaganda Techniques in Memes. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 6603–6617. Online: Association for Computational Linguistics.
  24. Semantics of the Black-Box: Can Knowledge Graphs Help Make Deep Learning Systems More Interpretable and Explainable? IEEE Internet Computing, 25(1): 51–59.
  25. Exploring Hate Speech Detection in Multimodal Publications. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 1459–1467. Snowmass Village, CO, USA: IEEE. ISBN 978-1-72816-553-0.
  26. Self-Attention Attribution: Interpreting Information Interactions Inside Transformer. In The 35th AAAI Conference on Artificial Intelligence. AAAI Press.
  27. On Explaining Multimodal Hateful Meme Detection Models. In Proceedings of the ACM Web Conference 2022, 3651–3655. Virtual Event, Lyon France: ACM. ISBN 978-1-4503-9096-5.
  28. Holzinger, A. 2021. Explainable AI and Multi-Modal Causability in Medicine. i-com, 19(3): 171–179.
  29. Towards Multi-Modal Causability with Graph Neural Networks Enabling Information Fusion for Explainable AI. Information Fusion, 71: 28–37.
  30. MUTE: A Multimodal Dataset for Detecting Hateful Memes. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop. Association for Computational Linguistics.
  31. The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention. In Proceedings of the 39th International Conference on Machine Learning, volume 162. Baltimore, Maryland, USA.
  32. A Review on Explainability in Multimodal Deep Neural Nets. IEEE Access, 9: 59800–59821.
  33. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA.
  34. Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. arXiv preprint.
  35. The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes. In Thirty-Fourth Annual Conference on Neural Information Processing Systems. Red Hook, NY, USA.
  36. The Hateful Memes Challenge: Competition Report. Proceedings of Machine Learning Research.
  37. Benchmarking Intersectional Biases in NLP. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3598–3609. Seattle, United States: Association for Computational Linguistics.
  38. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In International Conference on Machine Learning.
  39. VisualBERT: A Simple and Performant Baseline for Vision and Language. In The 58th Annual Meeting of the Association for Computational Linguistics, 5265–5275. Association for Computational Linguistics.
  40. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks. In 16th European Conference On Computer Vision.
  41. Dissecting the Meme Magic: Understanding Indicators of Virality in Image Memes. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1): 1–24.
  42. A Multimodal Framework for the Detection of Hateful Memes. arXiv preprint.
  43. Causal Reasoning Meets Visual Representation Learning: A Prospective Study. Machine Intelligence Research, 19(6): 485–511.
  44. SMIL: Multimodal Learning with Severely Missing Modality. In The Thirty-Fifth AAAI Conference on Artificial Intelligence.
  45. Explainable Artificial Intelligence (XAI) Techniques for Energy and Power Systems: Review, Challenges and Opportunities. Energy and AI, 9: 100169.
  46. Semantically Interpretable Activation Maps: What-Where-How Explanations within CNNs. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 4207–4215.
  47. Muennighoff, N. 2020. Vilio: State-of-the-art Visio-Linguistic Models Applied to Hateful Memes. arXiv preprint.
  48. Counterfactual VQA: A Cause-Effect Look at Language Bias. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12695–12705. Nashville, TN, USA: IEEE. ISBN 978-1-66544-509-2.
  49. Pearl, J. 2001. CAUSALITY: MODELS, REASONING, AND INFERENCE. Cambridge University Press, 11.
  50. In-Context Learning with Transformer Is Really Equivalent to a Contrastive Learning Pattern. arxiv:2310.13220.
  51. Rubin, D. B. 2008. For Objective Causal Inference, Design Trumps Analysis. The Annals of Applied Statistics, 2(3).
  52. Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation. AI for Social Good Workshop at NeurIPS 2019 (short paper).
  53. Causal Machine Learning for Healthcare and Precision Medicine. Royal Society Open Science, 9(8): 220638.
  54. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In 2017 IEEE International Conference on Computer Vision (ICCV), 618–626. Venice: IEEE. ISBN 978-1-5386-1032-9.
  55. Causal Effect of Racial Bias in Data and Machine Learning Algorithms towards User Persuasiveness & Discriminatory Decision Making: An Empirical Study. arXiv preprint.
  56. DISARM: Detecting the Victims Targeted by Harmful Memes. In Findings of the Association for Computational Linguistics: NAACL 2022, 1572–1588. Seattle, United States: Association for Computational Linguistics.
  57. Financial Inclusion and Intersectionality: A Case of Business Funding in the South African Informal Sector. Journal of Risk and Financial Management, 15(9): 380.
  58. Speith, T. 2022. A Review of Taxonomies of Explainable Artificial Intelligence (XAI) Methods. In 2022 ACM Conference on Fairness, Accountability, and Transparency, 2239–2250. Seoul Republic of Korea: ACM. ISBN 978-1-4503-9352-2.
  59. Distinguishing Hate Speech from Sarcasm. In 2022 International Conference for Advancement in Technology (ICONAT), 1–5. Goa, India: IEEE. ISBN 978-1-66542-577-3.
  60. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70, 3319–3328. PMLR.
  61. Multimodal Meme Dataset (MultiOFF) for Identifying Offensive Content in Image and Text. Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying.
  62. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint.
  63. Attention Is All You Need. In Thirty-First Annual Conference on Neural Information Processing Systems. Long Beach, CA, USA.
  64. A Review of Causality for Learning Algorithms in Medical Image Analysis. Machine Learning for Biomedical Imaging, 1(November 2022 issue): 1–17.
  65. Transformers Learn In-Context by Gradient Descent. In Proceedings of the 40th International Conference on Machine Learning, volume 1464, 24. Honolulu, HI, USA: JMLR.org.
  66. Causal Discovery in Manufacturing: A Structured Literature Review. Journal of Manufacturing and Materials Processing, 6(1): 10.
  67. Multi-Modal Learning with Missing Modality via Shared-Specific Feature Modelling. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15878–15887. Vancouver, BC, Canada: IEEE. ISBN 9798350301298.
  68. A Survey on Extraction of Causal Relations from Natural Language Text. Knowledge and Information Systems, 64(5): 1161–1186.
  69. Causal Intersectionality for Fair Ranking. In 2nd Symposium on Foundations of Responsible Computing. FORC.
  70. A Survey of Large Language Models. arxiv:2303.18223.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yosuke Miyanishi (3 papers)
  2. Minh Le Nguyen (17 papers)
Citations (2)