Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study (2304.04339v2)

Published 10 Apr 2023 in cs.CL and cs.AI

Abstract: Recently, ChatGPT has drawn great attention from both the research community and the public. We are particularly interested in whether it can serve as a universal sentiment analyzer. To this end, in this work, we provide a preliminary evaluation of ChatGPT on the understanding of \emph{opinions}, \emph{sentiments}, and \emph{emotions} contained in the text. Specifically, we evaluate it in three settings, including \emph{standard} evaluation, \emph{polarity shift} evaluation and \emph{open-domain} evaluation. We conduct an evaluation on 7 representative sentiment analysis tasks covering 17 benchmark datasets and compare ChatGPT with fine-tuned BERT and corresponding state-of-the-art (SOTA) models on them. We also attempt several popular prompting techniques to elicit the ability further. Moreover, we conduct human evaluation and present some qualitative case studies to gain a deep comprehension of its sentiment analysis capabilities.

Evaluation of ChatGPT as a Sentiment Analyzer

The paper titled "Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study" addresses the capabilities of ChatGPT in performing sentiment analysis across various tasks. The researchers assess ChatGPT's performance against fine-tuned BERT models and contemporary state-of-the-art (SOTA) models on 17 benchmark datasets, encompassing seven prominent sentiment analysis tasks. The core intent is to determine ChatGPT's viability as a universal sentiment analyzer, with specific interest in scenarios involving polarity shifts and the challenges of open-domain sentiment analysis.

The paper methodically evaluates ChatGPT's performance in different contexts. First, the paper emphasizes ChatGPT's zero-shot capabilities, where it executes sentiment classification tasks nearly on par with fine-tuned BERT models. It notably lags behind domain-specific SOTA models but displays a solid baseline performance without human-tuned data, which reinforces its utility in scenarios with sparse training data. This observation is supported by ChatGPT's reasonable yet somewhat inconsistent results in tasks like Aspect-Based Sentiment Classification (ABSC) and End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA).

The research identifies instances wherein ChatGPT's sentiment predictions, notably in Comparative Sentences Identification (CSI) and Comparative Element Extraction (CEE), do not match the dataset-labelled ground truth. A detailed human evaluation suggests these outputs may be conceptually accurate despite not aligning with specific annotations, showcasing potential misalignments between generative model outputs and rigid annotations.

Polarity shift evaluation, which focuses on negation and speculative language, demonstrates that ChatGPT outperforms fine-tuned BERT models, particularly in sentiment classification for sentences exhibiting such shifts. The results suggest an inherent robustness in ChatGPT for handling linguistically challenging scenarios without domain-specific training.

The open-domain evaluation reveals ChatGPT's adaptability across diverse datasets, outperforming multi-domain BERT models in several tasks. Despite displaying robust general performance, ChatGPT struggles in content-rich or less-common domains such as medicine and social media, highlighting areas for improvement. Nevertheless, ChatGPT's ability to approach human-level judgments in subjective tasks, as indicated by human evaluations, underscores its potential as a versatile sentiment analysis tool.

Advanced prompting methods, such as few-shot prompting, effectively boost ChatGPT's performance in these tasks. Chain-of-Thought (CoT) and self-consistency techniques are applied to further enhance its few-shot capabilities, demonstrating mixed results. While self-consistency reliably increases accuracy, CoT does not yield significant improvements, suggesting that the application of such techniques may vary with the complexity or nature of the task.

In summary, the paper affirms that ChatGPT exhibits substantial promise as a universal sentiment analyzer, particularly in settings where traditional data-intensive models are impractical. While its performance can rival existing models in many conditions, ChatGPT still underperforms fine-tuned models in highly specialized domains. These observations highlight the dynamic interplay between model generality and specificity in sentiment analysis tasks. Future developments could focus on improving ChatGPT's capabilities in recognizing nuanced, domain-specific sentiments and ambiguous linguistic constructs—especially within implicit sentiment contexts. This research thus serves as a foundation for further exploration into deploying LLMs in comprehensive sentiment analysis frameworks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zengzhi Wang (13 papers)
  2. Qiming Xie (4 papers)
  3. Zixiang Ding (15 papers)
  4. Yi Feng (101 papers)
  5. Rui Xia (53 papers)
  6. Zinong Yang (2 papers)
Citations (128)
Youtube Logo Streamline Icon: https://streamlinehq.com