Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models

Published 9 Dec 2023 in cs.CL | (2312.05434v1)

Abstract: The age of social media is rife with memes. Understanding and detecting harmful memes pose a significant challenge due to their implicit meaning that is not explicitly conveyed through the surface text and image. However, existing harmful meme detection approaches only recognize superficial harm-indicative signals in an end-to-end classification manner but ignore in-depth cognition of the meme text and image. In this paper, we attempt to detect harmful memes based on advanced reasoning over the interplay of multimodal information in memes. Inspired by the success of LLMs on complex reasoning, we first conduct abductive reasoning with LLMs. Then we propose a novel generative framework to learn reasonable thoughts from LLMs for better multimodal fusion and lightweight fine-tuning, which consists of two training stages: 1) Distill multimodal reasoning knowledge from LLMs; and 2) Fine-tune the generative framework to infer harmfulness. Extensive experiments conducted on three meme datasets demonstrate that our proposed approach achieves superior performance than state-of-the-art methods on the harmful meme detection task.

Abstract PDF HTML Upgrade to Chat

Authors (4)

References (60)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces Mr.Harm, which uses multimodal reasoning distilled from LLMs to detect harmful memes with enhanced accuracy.
It employs a two-stage training approach combining abductive reasoning and reasoning distillation to extract deep semantic cues from both text and images.
Empirical results show significant macro-F1 score improvements on datasets like Harm-C, Harm-P, and FHM compared to traditional detection methods.

Unveiling Harmful Memes with Multimodal Reasoning

The paper "Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from LLMs" presents an innovative approach for detecting harmful memes. It leverages multimodal reasoning distilled from LLMs to capture the implicit meaning of memes, aiming to improve detection performance over traditional methods that rely heavily on superficial signals in images and text.

Introduction

Social media's proliferation has magnified the role of memes as powerful vehicles for communication, often overshadowing their potential to disseminate harm through subtle imagery and text interplay. Existing methods predominantly use end-to-end classification that fails to delve deeply into the semantic nuances required to distinguish harmful from harmless content. To address this, the authors propose a method named Mr.Harm, integrating advanced reasoning capabilities from LLMs. They utilize a two-stage framework for multimodal reasoning: extracting reasoning knowledge from LLMs and fine-tuning smaller LLMs for practical deployment.

Methodology

Multimodal Reasoning Framework

The method begins with prompting LLMs for abductive reasoning, training them to generate rationales that elucidate whether a meme is harmful based on integrated text and image cues. These rationales capture complex contextual and cultural information, which is otherwise inaccessible to simpler models focused on classification alone.

Figure 1: The overall pipeline of our method. We first conduct abductive reasoning with LLMs to extract harmfulness rationales using meme text and image captions.

Generative Framework

The proposed generative model is split into two distinct training stages:

Reasoning Distillation - Fine-tunes a smaller LLM to absorb reasoning paths from LLMs, thereby grounding it in rich multimodal representations for robust detection.
Harmfulness Inference - Utilizes distilled multimodal knowledge to generate final harmfulness predictions for given meme content.

Experimental Results

Empirical evaluation is conducted on three public meme datasets—Harm-C, Harm-P, and FHM, with Mr.Harm demonstrating considerable performance gains. Notably, it achieves substantial improvements in macro-F1 scores, especially in datasets where traditional models struggle due to the nuanced nature of memes.

Figure 2: Examples of correctly predicted harmful memes in (a) Harm-C, (b) Harm-P, and (c) FHM dataset.

Ablation Studies

A detailed ablation study underscores the significance of each component, emphasizing the critical role of multimodal reasoning distilled from LLMs. Removing elements like reasoning distillation or fine-tuning results in substantial performance drops, highlighting the robustness of the proposed method.

Error Analysis

The paper acknowledges areas for improvement, specifically regarding the misrecognition of images necessitating extensive background knowledge. This suggests opportunities for enhancement via the integration of comprehensive datasets and more sophisticated visual representations.

Figure 3: Examples of wrongly predicted memes by our proposed framework with the ground truth (a) harmful and (b) harmless.

Discussion

The paper opens avenues for future exploration into explainability and generalization of harmful meme detection frameworks. It suggests incorporating visual LLMs to enrich visual features and improve the distillation of multimodal reasoning.

Conclusion

The research offers a significant advancement in harmful meme detection by moving beyond surface-level interpretation, employing LLMs for a comprehensive understanding of meme semantics. This approach not only improves detection accuracy but also provides a foundation for developing more robust AI systems capable of nuanced multimodal reasoning.

Figure 4: The details of our Multimodal Fusion module.

The implications of this work resonate in broader AI applications, particularly in monitoring and counteracting disinformation on digital platforms. Future efforts will likely focus on enhancing explanaibaility and refining multimodal synergies to fully leverage LLM capabilities.

Markdown Report Issue