Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis (2407.19528v1)

Published 28 Jul 2024 in cs.CL

Abstract: Sentiment analysis is the process of identifying and categorizing people's emotions or opinions regarding various topics. Analyzing political sentiment is critical for understanding the complexities of public opinion processes, especially during election seasons. It gives significant information on voter preferences, attitudes, and current trends. In this study, we investigate political sentiment analysis during Bangladeshi elections, specifically examining how effectively Pre-trained LLMs (PLMs) and LLMs capture complex sentiment characteristics. Our study centers on the creation of the "Motamot" dataset, comprising 7,058 instances annotated with positive and negative sentiments, sourced from diverse online newspaper portals, forming a comprehensive resource for political sentiment analysis. We meticulously evaluate the performance of various PLMs including BanglaBERT, Bangla BERT Base, XLM-RoBERTa, mBERT, and sahajBERT, alongside LLMs such as Gemini 1.5 Pro and GPT 3.5 Turbo. Moreover, we explore zero-shot and few-shot learning strategies to enhance our understanding of political sentiment analysis methodologies. Our findings underscore BanglaBERT's commendable accuracy of 88.10% among PLMs. However, the exploration into LLMs reveals even more promising results. Through the adept application of Few-Shot learning techniques, Gemini 1.5 Pro achieves an impressive accuracy of 96.33%, surpassing the remarkable performance of GPT 3.5 Turbo, which stands at 94%. This underscores Gemini 1.5 Pro's status as the superior performer in this comparison.

PDF HTML Abstract

Overview of the "Motamot" Dataset for Political Sentiment Analysis in Bengali

The paper "Motamot: A Dataset for Revealing the Supremacy of LLMs over Transformer Models in Bengali Political Sentiment Analysis" presents a detailed paper on enhancing political sentiment analysis in the Bengali language by leveraging advanced NLP models. This paper is set against the backdrop of Bangladeshi elections, emphasizing the importance of understanding public opinion and sentiment through online political discourse.

Dataset and Methodology

The authors introduce the "Motamot" dataset, comprising 7,058 labeled instances of political sentiment derived from various Bangladeshi online newspapers. This dataset is meticulously constructed to include diverse opinions, serving as an invaluable resource for research in political sentiment analysis within the Bengali context. The data points are categorized into positive and negative sentiments, reflecting the general political discourse observed during election periods.

To address the limited availability of annotated data in Bengali, the researchers approach sentiment analysis using both pre-trained LLMs (PLMs) and LLMs. The section on PLMs evaluates models such as BanglaBERT, Bangla BERT Base, XLM-RoBERTa, mBERT, and sahajBERT. Among these, BanglaBERT exhibited an impressive accuracy of 88.10%, underscoring its proficiency in processing Bengali text with minimal computational resources.

The paper further explores the capabilities of LLMs like Gemini 1.5 Pro and GPT 3.5 Turbo, particularly focusing on zero-shot and few-shot learning strategies. These techniques are crucial for contexts where language resources are sparse, as they minimize dependency on large labeled datasets. Notably, Gemini 1.5 Pro achieved a remarkable accuracy of 96.33% with few-shot learning, highlighting its potential as a superior performer in sentiment analysis tasks.

Findings and Implications

The paper's experiments reveal significant performance disparities between PLMs and LLMs, particularly emphasizing the advantages LLMs hold in few-shot scenarios. This finding is pivotal given the limited annotated data available for Bengali political sentiment analysis. The research suggests that while zero-shot learning remains less effective due to language-specific challenges, few-shot learning provides a robust framework, outperforming PLMs and showcasing minimal hallucination effects.

The implications of this work are manifold, both practically and theoretically. Practically, the dataset and methodologies provide a comprehensive framework for enhancing political decision-making and understanding voter sentiment in Bangladesh. Theoretically, the paper contributes to expanding the use of LLMs in low-resource languages, proposing models like Gemini 1.5 Pro as efficient alternatives for comprehensive sentiment analysis.

Future Directions

Looking ahead, the authors propose fine-grained sentiment analysis and multimodal analysis as key areas for future research. The integration of image, text, and other modalities may enrich sentiment analysis, providing a more holistic view of political opinion. Furthermore, incorporating explainable AI (XAI) techniques could unravel the decision-making processes of advanced NLP models, fostering trust and transparency in automated sentiment analysis systems.

In conclusion, this research presents significant advancements in the domain of political sentiment analysis in Bengali, leveraging the cutting-edge capabilities of LLMs. The introduction of the "Motamot" dataset marks a crucial step towards better understanding political dynamics during elections, offering a rich repository for future explorations in sentiment analysis within low-resource language settings.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Related Papers

Find Related Papers

Tweets

YouTube

Show All Videos