BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes (2404.03022v2)

Published 3 Apr 2024 in cs.CL, cs.CV, cs.IT, cs.LG, and math.IT

Abstract: Memes, combining text and images, frequently use metaphors to convey persuasive messages, shaping public opinion. Motivated by this, our team engaged in SemEval-2024 Task 4, a hierarchical multi-label classification task designed to identify rhetorical and psychological persuasion techniques embedded within memes. To tackle this problem, we introduced a caption generation step to assess the modality gap and the impact of additional semantic information from images, which improved our result. Our best model utilizes GPT-4 generated captions alongside meme text to fine-tune RoBERTa as the text encoder and CLIP as the image encoder. It outperforms the baseline by a large margin in all 12 subtasks. In particular, it ranked in top-3 across all languages in Subtask 2a, and top-4 in Subtask 2b, demonstrating quantitatively strong performance. The improvement achieved by the introduced intermediate step is likely attributable to the metaphorical essence of images that challenges visual encoders. This highlights the potential for improving abstract visual semantics encoding.

References (35)

Citations (1)

View on Semantic Scholar

Summary

The paper proposes a novel intermediate caption generation step using GPT-4 to integrate textual and visual features effectively.
The study compares models like RoBERTa and LLaVA-1.5 to achieve robust, multilingual classification of persuasion techniques in memes.
The results demonstrate that generated captions significantly reduce modality gaps and improve detection of nuanced rhetorical strategies.

Multimodal and Multilingual Exploration of Persuasion Techniques in Memes

Introduction

Memes have emerged as a potent form of communication, especially in shaping public opinion through various persuasion techniques. The BCAmirs team's participation in SemEval-2024 Task 4 seeks to tackle the hierarchical multi-label classification of memes to identify embedded rhetorical and psychological techniques. Their approach involves a novel step of meme caption generation to bridge the modality gap between textual and visual information, significantly improving performance across all tasks.

Background on Modality Gap and Persuasion Technique Classification

The disparity between visual and textual modalities, or the modality gap, has been a focal point of recent studies aiming to enhance multimodal LLMs (MLLMs). Works such as ChatBridge and LION have pushed the boundaries in bridging modalities, demonstrating advancements in multimodal tasks. Similarly, understanding persuasion techniques in memes, as explored by previous studies, underscores the importance of detecting nuanced rhetorical strategies within multimodal content.

Methodology

BCAmirs' methodology centers around an intermediate caption generation step utilizing models like GPT-4, which leverages additional semantic information. The approach compares various models, including language representation models (LRMs) like RoBERTa, and MLLMs such as LLaVA-1.5. This comparison sheds light on how different combinations of textual and visual features influence the understanding of persuasion techniques.

Caption Generation

The initial step involves using models like GPT-4 and LLaVA-1.5 for meme captioning. The generated captions intend to encapsulate the essence of the meme, addressing both the textual content and the metaphorical imagery. This process plays a critical role in bridging the textual and visual modality gap, ostensibly providing a more comprehensive dataset for classification tasks.

Persuasion Technique Classification

Following caption generation, various models are employed to classify persuasion techniques. This stage examines the effectiveness of incorporating generated captions alongside meme text and images. The BCAmirs team explores configurations using only text, text plus images, and text with generated captions, among others, to evaluate the modality gap and the utility of additional semantic information in classification tasks.

Experiments and Results

The team's experiments highlight the effectiveness of their method, particularly when utilizing the ConcatRoBERTa model, which incorporates meme text, images, and GPT-4 generated captions. Notably, this approach outperforms baselines and demonstrates robust performance across different languages in hierarchical classification tasks. The experiments suggest that additional semantic information from captions, especially those generated by GPT-4, significantly aids the classification process.

Conclusion and Future Directions

The BCAmirs team's work offers a novel perspective on classifying persuasion techniques in memes, leveraging the power of generated captions to enhance multimodal classification. Their findings suggest that addressing the modality gap through caption generation can improve the detection of nuanced persuasion techniques. Future research avenues include exploring more advanced models for caption generation and extending the analysis to a broader range of low-resource languages.

This exploration into persuasion techniques within memes signifies a step forward in understanding and mitigating the impact of disinformation campaigns. By enhancing the classification of memes, the research opens doors to more effective tools for recognizing and countering misleading content across social platforms.

PDF Markdown

Related Papers

GitHub

GitHub - AmirAbaskohi/Beyond-Words-A-Multimodal-Exploration-of-Persuasion-in-Memes: Beyond Words: A Multimodal Exploration of Persuasion in Memes (12 stars)

Tweets

https://twitter.com/AmirAbaskohi/status/1777160746104938836