Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection (2407.21004v3)

Published 30 Jul 2024 in cs.CL and cs.CV

Abstract: Recent advances show that two-stream approaches have achieved outstanding performance in hateful meme detection. However, hateful memes constantly evolve as new memes emerge by fusing progressive cultural ideas, making existing methods obsolete or ineffective. In this work, we explore the potential of Large Multimodal Models (LMMs) for hateful meme detection. To this end, we propose Evolver, which incorporates LMMs via Chain-of-Evolution (CoE) Prompting, by integrating the evolution attribute and in-context information of memes. Specifically, Evolver simulates the evolving and expressing process of memes and reasons through LMMs in a step-by-step manner. First, an evolutionary pair mining module retrieves the top-k most similar memes in the external curated meme set with the input meme. Second, an evolutionary information extractor is designed to summarize the semantic regularities between the paired memes for prompting. Finally, a contextual relevance amplifier enhances the in-context hatefulness information to boost the search for evolutionary processes. Extensive experiments on public FHM, MAMI, and HarM datasets show that CoE prompting can be incorporated into existing LMMs to improve their performance. More encouragingly, it can serve as an interpretive tool to promote the understanding of the evolution of social memes. Homepage

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Multimodal Hate Speech Detection in Memes using Contrastive Language-Image Pre-training. IEEE Access (2024).
  2. Openflamingo: An open-source framework for training large autoregressive vision-language models. arXiv preprint arXiv:2308.01390 (2023).
  3. DeepHate: Hate speech detection via multi-faceted text representations. In Proceedings of the 12th ACM Conference on Web Science. 11–20.
  4. InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. arXiv:2305.06500 [cs.CV]
  5. Richard Dawkins. 2016a. The extended selfish gene. Oxford University Press.
  6. Richard Dawkins. 2016b. The selfish gene. Oxford university press.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
  8. On explaining multimodal hateful meme detection models. In Proceedings of the ACM Web Conference 2022. 3651–3655.
  9. The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in neural information processing systems 33 (2020), 2611–2624.
  10. Gokul Karthik Kumar and Karthik Nandakumar. 2022. Hate-CLIPper: Multimodal hateful meme classification based on cross-modal interaction of CLIP features. arXiv preprint arXiv:2210.05916 (2022).
  11. Marne Levine. 2013. Controversial, harmful and hateful speech on Facebook. Internet: https://www. facebook. com/notes/facebook-safety/controversial-harmful-and-hateful-speech-on-facebook/574430655911054 (24.3. 2014) (2013).
  12. Multimodal foundation models: From specialists to general-purpose assistants. arXiv preprint arXiv:2309.10020 1, 2 (2023), 2.
  13. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).
  14. A multimodal framework for the detection of hateful memes. arXiv preprint arXiv:2012.12871 (2020).
  15. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744 (2023).
  16. Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).
  17. GPT-4V(ision) as a social media analysis engine. arXiv preprint arXiv:2311.07547 (2023).
  18. OpenAI. 2023. GPT-4V(ision) System Card. https://api.semanticscholar.org/CorpusID:263218031
  19. MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets. In Findings of the Association for Computational Linguistics: EMNLP 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, Punta Cana, Dominican Republic, 4439–4455. https://doi.org/10.18653/v1/2021.findings-emnlp.379
  20. On the Evolution of (Hateful) Memes by Means of Multimodal Contrastive Learning. In IEEE Symposium on Security and Privacy (S&P). IEEE.
  21. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  22. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023).
  23. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  24. The dawn of LMMs: Preliminary explorations with GPT-4V(ision). arXiv preprint arXiv:2309.17421 9, 1 (2023), 1.
  25. Mmicl: Empowering vision-language model with multi-modal in-context learning. arXiv preprint arXiv:2309.07915 (2023).
  26. MiniGPT-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592 (2023).
Citations (2)

Summary

  • The paper introduces a novel chain-of-evolution framework that integrates evolutionary pair mining, information extraction, and contextual relevance amplification to improve hateful meme detection.
  • It demonstrates significant improvements in detection accuracy and AUC by leveraging large multimodal models for nuanced analysis of evolving hateful content.
  • The approach enhances model interpretability and offers actionable insights for real-time mitigation of harmful meme propagation on digital platforms.

Enhancing Hateful Meme Detection through Chain-of-Evolution Prompting

The detection of hateful memes is a challenging task at the intersection of natural language processing and computer vision, exacerbated by the dynamic and evolving nature of memes. This paper proposes a novel approach, Evolver, utilizing Chain-of-Evolution (CoE) prompting to improve the performance of Large Multimodal Models (LMMs) in identifying hateful memes. This methodology is particularly relevant due to the fluid nature of meme culture, which incessantly integrates new ideas and cultural symbols, often rendering existing detection methods inadequate.

Methodological Framework

Evolver introduces a systematic approach to incorporate evolutionary dynamics into the hateful meme detection workflow, structured around three main components: evolutionary pair mining, evolutionary information extraction, and contextual relevance amplification.

  1. Evolutionary Pair Mining: This component identifies memes evolved by integrating previous memes or cultural concepts. By employing a curated set of memes, the method retrieves the top-K most similar memes to the target meme using textual and visual embeddings. This step is crucial for understanding how memes have historically morphed into hateful content.
  2. Evolutionary Information Extraction: This step leverages LMMs to summarize semantic regularities between paired memes. Through strategic prompt design, it extracts characteristics of hatefulness, utilizing instructions that reflect specific hateful content guidelines.
  3. Contextual Relevance Amplifier: By enhancing the focus on hateful components and combining evolutionary insights, this element strengthens the model's detection capabilities, ensuring that nuanced and contextually embedded hatefulness is not overlooked.

Empirical Analysis

Comprehensive experiments on publicly available datasets, including FHM, MAMI, and HarM, demonstrate that incorporating CoE prompting into LMMs substantially enhances their performance in hateful meme detection. Noteworthy improvements in accuracy and AUC metrics were observed, illustrating Evolver's efficacy over traditional two-stream methods and baseline LMMs. Notably, the introduction of the CoE framework not only boosts detection accuracy but also imbues the models with interpretative capabilities, facilitating a deeper understanding of meme evolution.

Theoretical and Practical Implications

The implications of this paper are twofold. Theoretically, the integration of chain-of-evolution reasoning into multimodal models enriches our understanding of how memes propagate and transform in digital cultures. This approach opens avenues for future research in the comprehensive modeling of cultural evolution within machine learning frameworks.

Practically, Evolver offers a robust tool for platforms seeking to mitigate the spread of harmful content. By enhancing detection models with evolutionary reasoning, social media companies can better anticipate and flag emerging forms of hate speech embedded within meme culture.

Future Directions

While Evolver marks a significant step in hateful meme detection, further research is warranted to refine these models. Future studies might explore adaptive learning mechanisms that update evolutionary pair mining and semantic extraction processes in real-time, reflecting the ever-changing landscape of internet memes. Additionally, improving the granularity of visual-textual alignment and diversifying the curated meme datasets could enhance the robustness and cultural relevance of LMMs.

In conclusion, Evolver addresses the complex challenge of hateful meme detection by bridging static detection techniques with dynamic, evolution-aware methodologies. Its application of chain-of-evolution prompting marks a transformative approach, offering both a more effective detection mechanism and a lens through which to view the cultural processes underlying meme dissemination.

Youtube Logo Streamline Icon: https://streamlinehq.com