Explainable Multimodal Sentiment Analysis on Bengali Memes (2401.09446v1)
Abstract: Memes have become a distinctive and effective form of communication in the digital era, attracting online communities and cutting across cultural barriers. Even though memes are frequently linked with humor, they have an amazing capacity to convey a wide range of emotions, including happiness, sarcasm, frustration, and more. Understanding and interpreting the sentiment underlying memes has become crucial in the age of information. Previous research has explored text-based, image-based, and multimodal approaches, leading to the development of models like CAPSAN and PromptHate for detecting various meme categories. However, the study of low-resource languages like Bengali memes remains scarce, with limited availability of publicly accessible datasets. A recent contribution includes the introduction of the MemoSen dataset. However, the achieved accuracy is notably low, and the dataset suffers from imbalanced distribution. In this study, we employed a multimodal approach using ResNet50 and BanglishBERT and achieved a satisfactory result of 0.71 weighted F1-score, performed comparison with unimodal approaches, and interpreted behaviors of the models using explainable artificial intelligence (XAI) techniques.
- E. Hossain, O. Sharif, and M. M. Hoque, “Memosen: A multimodal dataset for sentiment analysis of memes,” in Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022, pp. 1542–1554.
- E. Asmawati, A. Saikhu, and D. Siahaan, “Sentiment analysis of text memes: A comparison among supervised machine learning methods,” in 2022 9th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI). IEEE, 2022, pp. 349–354.
- N. V. Alluri and N. D. Krishna, “Multi modal analysis of memes for sentiment extraction,” in 2021 Sixth International Conference on Image Information Processing (ICIIP), vol. 6. IEEE, 2021, pp. 213–217.
- D. Kiela, H. Firooz, A. Mohan, V. Goswami, A. Singh, P. Ringshia, and D. Testuggine, “The hateful memes challenge: Detecting hate speech in multimodal memes,” Advances in neural information processing systems, vol. 33, pp. 2611–2624, 2020.
- S. Pramanick, D. Dimitrov, R. Mukherjee, S. Sharma, M. S. Akhtar, P. Nakov, and T. Chakraborty, “Detecting harmful memes and their targets,” arXiv preprint arXiv:2110.00413, 2021.
- C. Sharma, W. Paka, Scott, D. Bhageria, A. Das, S. Poria, T. Chakraborty, and B. Gambäck, “Task Report: Memotion Analysis 1.0 @SemEval 2020: The Visuo-Lingual Metaphor!” in Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020). Barcelona, Spain: Association for Computational Linguistics, Sep 2020.
- R. Cao, R. K.-W. Lee, W.-H. Chong, and J. Jiang, “Prompting for multimodal hateful meme classification,” arXiv preprint arXiv:2302.04156, 2023.
- E. Hossain, O. Sharif, and M. M. Hoque, “Mute: A multimodal dataset for detecting hateful memes,” in Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop, 2022, pp. 32–39.
- M. R. Karim, S. K. Dey, T. Islam, M. Shajalal, and B. R. Chakravarthi, “Multimodal hate speech detection from bengali memes and texts,” in International Conference on Speech and Language Technologies for Low-resource Languages. Springer, 2022, pp. 293–308.
- N. Jannat, A. Das, O. Sharif, and M. M. Hoque, “An empirical framework for identifying sentiment from multimodal memes using fusion approach,” in 2022 25th International Conference on Computer and Information Technology (ICCIT). IEEE, 2022, pp. 791–796.
- J. Sun, H. Yin, Y. Tian, J. Wu, L. Shen, and L. Chen, “Two-level multimodal fusion for sentiment analysis in public security,” Security and Communication Networks, vol. 2021, pp. 1–10, 2021.
- N. Varma Alluri and N. Dheeraj Krishna, “Multimodal analysis of memes for sentiment extraction,” arXiv e-prints, pp. arXiv–2112, 2021.
- M. Hodosh, P. Young, and J. Hockenmaier, “Framing image description as a ranking task: Data, models and evaluation metrics,” Journal of Artificial Intelligence Research, vol. 47, pp. 853–899, 2013.
- M. Hasan, N. Jannat, E. Hossain, O. Sharif, and M. M. Hoque, “Cuet-nlp@ dravidianlangtech-acl2022: Investigating deep learning techniques to detect multimodal troll memes,” in Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, 2022, pp. 170–176.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
- M. A. Ullah, M. M. Islam, N. B. Azman, and Z. M. Zaki, “An overview of multimodal sentiment analysis research: Opportunities and difficulties,” in 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR). IEEE, 2017, pp. 1–6.
- T. Hasan, A. Bhattacharjee, K. Samin, M. Hasan, M. Basak, M. S. Rahman, and R. Shahriyar, “Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, Nov. 2020, pp. 2612–2623. [Online]. Available: https://www.aclweb.org/anthology/2020.emnlp-main.207
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan et al., “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1314–1324.
- G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
- A. Bhattacharjee, T. Hasan, K. Mubasshir, M. S. Islam, W. A. Uddin, A. Iqbal, M. S. Rahman, and R. Shahriyar, “Banglabert: Lagnuage model pretraining and benchmarks for low-resource language understanding evaluation in bangla,” in Findings of the North American Chapter of the Association for Computational Linguistics: NAACL 2022, july 2022. [Online]. Available: https://arxiv.org/abs/2101.00204