BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes (2404.03022v2)
Abstract: Memes, combining text and images, frequently use metaphors to convey persuasive messages, shaping public opinion. Motivated by this, our team engaged in SemEval-2024 Task 4, a hierarchical multi-label classification task designed to identify rhetorical and psychological persuasion techniques embedded within memes. To tackle this problem, we introduced a caption generation step to assess the modality gap and the impact of additional semantic information from images, which improved our result. Our best model utilizes GPT-4 generated captions alongside meme text to fine-tune RoBERTa as the text encoder and CLIP as the image encoder. It outperforms the baseline by a large margin in all 12 subtasks. In particular, it ranked in top-3 across all languages in Subtask 2a, and top-4 in Subtask 2b, demonstrating quantitatively strong performance. The improvement achieved by the introduced intermediate step is likely attributable to the metaphorical essence of images that challenges visual encoders. This highlights the potential for improving abstract visual semantics encoding.
- LM-CPPF: Paraphrasing-guided data augmentation for contrastive prompt-based few-shot fine-tuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 670–681, Toronto, Canada. Association for Computational Linguistics.
- Lion: Empowering multimodal large language model with dual-level visual knowledge. arXiv preprint arXiv:2311.11860.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Semeval-2024 task 4: Multilingual detection of persuasion techniques in memes. In Proceedings of the 18th International Workshop on Semantic Evaluation, SemEval 2024, Mexico City, Mexico.
- SemEval-2021 task 6: Detection of persuasion techniques in texts and images. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 70–98, Online. Association for Computational Linguistics.
- Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning Representations.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- EunJeong Hwang and Vered Shwartz. 2023. MemeCap: A dataset for captioning and interpreting memes. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1433–1445, Singapore. Association for Computational Linguistics.
- BRAINTEASER: Lateral thinking puzzles for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14317–14332, Singapore. Association for Computational Linguistics.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Learning and evaluation in the presence of class hierarchies: Application to text categorization. In Advances in Artificial Intelligence: 19th Conference of the Canadian Society for Computational Studies of Intelligence, Canadian AI 2006, Québec City, Québec, Canada, June 7-9, 2006. Proceedings 19, pages 395–406. Springer.
- Anushka Kulkarni. 2017. Internet meme and political discourse: A study on the impact of internet meme as a tool in communicating political satire. Journal of Content, Community & Communication Amity School of Communication, 6.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597.
- Visualbert: A simple and performant baseline for vision and language. arxiv 2019. arXiv preprint arXiv:1908.03557.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- Visual instruction tuning. In NeurIPS.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
- AIMH at SemEval-2021 task 6: Multimodal classification using an ensemble of transformer models. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 1020–1026, Online. Association for Computational Linguistics.
- Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582.
- Gpt-4 technical report.
- Seokmok Park and Joonki Paik. 2023. Refcap: image captioning with referent objects attributes. Scientific Reports, 13(1):21577.
- Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Belgium, Brussels. Association for Computational Linguistics.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
- Targeted adversarial attacks against neural machine translation. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE.
- Comparing encoder-only and encoder-decoder transformers for relation extraction from biomedical texts: An empirical study on ten benchmark datasets. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 376–382, Dublin, Ireland. Association for Computational Linguistics.
- Mmf: A multimodal framework for vision and language research. https://github.com/facebookresearch/mmf.
- Text classification via large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8990–9005, Singapore. Association for Computational Linguistics.
- Attention is all you need. Advances in neural information processing systems, 30.
- Ben Wasike. 2022. Memes, memes, everywhere, nor any meme to trust: Examining the credibility and persuasiveness of covid-19-related memes. Journal of Computer-Mediated Communication, 27(2):zmab024.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
- On evaluating adversarial robustness of large vision-language models. Advances in Neural Information Processing Systems, 36.
- Chatbridge: Bridging modalities with large language model as a language catalyst. arXiv preprint arXiv:2305.16103.