Tackling Fake News in Bengali: Unraveling the Impact of Summarization vs. Augmentation on Pre-trained Language Models (2307.06979v2)
Abstract: With the rise of social media and online news sources, fake news has become a significant issue globally. However, the detection of fake news in low resource languages like Bengali has received limited attention in research. In this paper, we propose a methodology consisting of four distinct approaches to classify fake news articles in Bengali using summarization and augmentation techniques with five pre-trained LLMs. Our approach includes translating English news articles and using augmentation techniques to curb the deficit of fake news articles. Our research also focused on summarizing the news to tackle the token length limitation of BERT based models. Through extensive experimentation and rigorous evaluation, we show the effectiveness of summarization and augmentation in the case of Bengali fake news detection. We evaluated our models using three separate test datasets. The BanglaBERT Base model, when combined with augmentation techniques, achieved an impressive accuracy of 96% on the first test dataset. On the second test dataset, the BanglaBERT model, trained with summarized augmented news articles achieved 97% accuracy. Lastly, the mBERT Base model achieved an accuracy of 86% on the third test dataset which was reserved for generalization performance evaluation. The datasets and implementations are available at https://github.com/arman-sakif/Bengali-Fake-News-Detection
- \bibcommenthead
- McGarrigle, J.: Explained: What is Fake news? | Social Media and Filter Bubbles. https://www.webwise.ie/teachers/what-is-fake-news/. [Online; accessed 2023-05-29] (2018)
- Barthel, M.: Many Americans Believe Fake News Is Sowing Confusion. https://www.pewresearch.org/journalism/2016/12/15/many-americans-believe-fake-news-is-sowing-confusion/. [Online; accessed 2023-05-29] (2016)
- Shishir, Q.: Reports misleadingly claim Saudi Arabia plans to ’remove Islamic oath from national flag’. https://factcheck.afp.com/doc.afp.com.9YR4EL. [Online; accessed 2023-05-29] (2022)
- Shishir, Q.: Bangladeshi media’s troubling tango with fake news. https://www.thedailystar.net/views/opinion/news/bangladeshi-medias-troubling-tango-fake-news-2985186. [Online; accessed 2023-05-29] (2022)
- Check Team, B.F.: 2020: ekan soNNGbadmadhYoem kotiT vuya khobor pRokaisht Hoeyech? https://www.boombd.com/fact-file/fake-and-misleading-news-in-bangladeshi-news-outlets-in-2020-11830. [Online; accessed 2023-05-29] (2021)
- Spring, M.: Coronavirus: The human cost of virus misinformation. https://www.bbc.com/news/stories-52731624. [Online; accessed 2023-05-29]
- Stecula, D.: Analysis | Fake news might be harder to spot than most people believe. https://www.washingtonpost.com/news/monkey-cage/wp/2017/07/10/fake-news-might-be-harder-to-spot-than-most-people-believe. [Online; accessed 2023-05-29] (2017)
- Perkins, D.: Confirmation bias - Wikipedia. https://en.wikipedia.org/wiki/Confirmation_bias. [Online; accessed 2023-04-11] (2010)
- Wang, W.Y.: “liar, liar pants on fire”: A new benchmark dataset for fake news detection. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 422–426. Association for Computational Linguistics, Vancouver, Canada (2017). https://doi.org/10.18653/v1/P17-2067
- Vogel, I., Jiang, P.: Fake news detection with the new german dataset “germanfakenc”. In: Digital Libraries for Open Knowledge: 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, September 9-12, 2019, Proceedings 23, pp. 288–295 (2019). Springer
- Murayama, T., Hisada, S., Uehara, M., Wakamiya, S., Aramaki, E.: Annotation-scheme reconstruction for “fake news” and Japanese fake news dataset. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 7226–7234. European Language Resources Association, Marseille, France (2022)