Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models (2401.13298v1)
Abstract: The age of social media is flooded with Internet memes, necessitating a clear grasp and effective identification of harmful ones. This task presents a significant challenge due to the implicit meaning embedded in memes, which is not explicitly conveyed through the surface text and image. However, existing harmful meme detection methods do not present readable explanations that unveil such implicit meaning to support their detection decisions. In this paper, we propose an explainable approach to detect harmful memes, achieved through reasoning over conflicting rationales from both harmless and harmful positions. Specifically, inspired by the powerful capacity of LLMs on text generation and reasoning, we first elicit multimodal debate between LLMs to generate the explanations derived from the contradictory arguments. Then we propose to fine-tune a small LLM as the debate judge for harmfulness inference, to facilitate multimodal fusion between the harmfulness rationales and the intrinsic multimodal information within memes. In this way, our model is empowered to perform dialectical reasoning over intricate and implicit harm-indicative patterns, utilizing multimodal explanations originating from both harmless and harmful arguments. Extensive experiments on three public meme datasets demonstrate that our harmful meme detection approach achieves much better performance than state-of-the-art methods and exhibits a superior capacity for explaining the meme harmfulness of the model predictions.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023 (2023).
- Michael Basseches. 1984. Dialectical thinking. Norwood, NJ: Ablex (1984).
- Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems. 1877–1901.
- Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection. In Proceedings of the 31th ACM international conference on multimedia.
- Prompting for Multimodal Hateful Meme Classification. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 321–332.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. ArXiv (2023).
- Detecting hate speech in multi-modal memes. arXiv preprint arXiv:2012.14891 (2020).
- Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. 4171–4186.
- Summeval: Re-evaluating summarization evaluation. Transactions of the Association for Computational Linguistics 9 (2021), 391–409.
- SemEval-2022 Task 5: Multimedia automatic misogyny identification. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). 533–549.
- Complexity-Based Prompting for Multi-step Reasoning. In The Eleventh International Conference on Learning Representations.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- Decoding the Underlying Meaning of Multimodal Hateful Memes. arXiv preprint arXiv:2305.17678 (2023).
- On Explaining Multimodal Hateful Meme Detection Models. In Proceedings of the ACM Web Conference 2022. 3651–3655.
- Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection. arXiv preprint arXiv:2309.12247 (2023).
- Identifying Creative Harmful Memes via Prompt based Approach. In Proceedings of the ACM Web Conference 2023. 3868–3872.
- Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.
- Supervised multimodal bitransformers for classifying images and text. arXiv preprint arXiv:1909.02950 (2019).
- The hateful memes challenge: Competition report. In NeurIPS 2020 Competition and Demonstration Track. PMLR, 344–360.
- The hateful memes challenge: detecting hate speech in multimodal memes. In Proceedings of the 34th International Conference on Neural Information Processing Systems. 2611–2624.
- Segment anything. arXiv preprint arXiv:2304.02643 (2023).
- Large Language Models are Zero-Shot Reasoners. In ICML 2022 Workshop on Knowledge Retrieval and Language Models.
- MMOCR: a comprehensive toolbox for text detection, recognition and understanding. In Proceedings of the 29th ACM International Conference on Multimedia. 3791–3794.
- Disentangling hate in online memes. In Proceedings of the 29th ACM International Conference on Multimedia. 5138–5147.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).
- Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019).
- Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models. In The 2023 Conference on Empirical Methods in Natural Language Processing.
- GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse. arXiv preprint arXiv:2401.01523 (2024).
- Detect Rumors in Microblog Posts for Low-Resource Domains via Adversarial Contrastive Learning. In Findings of the Association for Computational Linguistics: NAACL 2022. 2543–2556.
- Rumor Detection on Twitter with Claim-Guided Hierarchical Graph Attention Networks. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 10035–10047.
- Zero-shot rumor detection with propagation structure via prompt learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 5213–5221.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.
- A multimodal framework for the detection of hateful memes. arXiv preprint arXiv:2012.12871 (2020).
- Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
- ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 13–23.
- Shikib Mehri and Maxine Eskenazi. 2020. Unsupervised Evaluation of Interactive Dialog with DialoGPT. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 225–235.
- Niklas Muennighoff. 2020. Vilio: State-of-the-art visio-linguistic models applied to hateful memes. arXiv preprint arXiv:2012.07788 (2020).
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114 (2021).
- OpenAI. 2023. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023). https://api.semanticscholar.org/CorpusID:257532815
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- Detecting Harmful Memes and Their Targets. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2783–2796.
- MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets. In Findings of the Association for Computational Linguistics: EMNLP 2021. 4439–4455.
- Learning transferable visual models from natural language supervision. In International conference on machine learning. 8748–8763.
- Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446 (2021).
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2016), 1137–1149.
- Vlad Sandulescu. 2020. Detecting hateful memes using a multimodal deep ensemble. arXiv preprint arXiv:2012.13235 (2020).
- Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2556–2565.
- Detecting and understanding harmful memes: A survey. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. 5597–5606.
- Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text. In Proceedings of the second workshop on trolling, aggression and cyberbullying. 32–41.
- Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. 3104–3112.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022).
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Detection and fine-grained classification of cyberbullying events. In Proceedings of the international conference recent advances in natural language processing. 672–680.
- Attention is All you Need. In NIPS.
- Riza Velioglu and Jewgeni Rose. 2020. Detecting hate speech in memes using multimodal deep learning approaches: Prize-winning solution to hateful memes challenge. arXiv preprint arXiv:2012.12975 (2020).
- Large language models are not fair evaluators. arXiv preprint arXiv:2305.17926 (2023).
- Self-Consistency Improves Chain of Thought Reasoning in Language Models. In The Eleventh International Conference on Learning Representations.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems.
- Answering questions by meta-reasoning over multiple chains of thought. arXiv preprint arXiv:2304.13007 (2023).
- Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493 (2022).
- Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:2302.00923 (2023).
- Progressive-hint prompting improves reasoning in large language models. arXiv preprint arXiv:2304.09797 (2023).
- Multimodal learning for hateful memes detection. In 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 1–6.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592 (2023).
- Multimodal zero-shot hateful meme detection. In 14th ACM Web Science Conference 2022. 382–389.
- Ron Zhu. 2020. Enhance multimodal transformer with external label and in-domain pretrain: Hateful meme challenge winning solution. arXiv preprint arXiv:2012.08290 (2020).
- Hongzhan Lin (33 papers)
- Ziyang Luo (35 papers)
- Wei Gao (203 papers)
- Jing Ma (136 papers)
- Bo Wang (823 papers)
- Ruichao Yang (9 papers)