Comparing and Combining Sentiment Analysis Methods (1406.0032v1)

Published 30 May 2014 in cs.CL

Abstract: Several messages express opinions about events, products, and services, political views or even their author's emotional state and mood. Sentiment analysis has been used in several applications including analysis of the repercussions of events in social networks, analysis of opinions about products and services, and simply to better understand aspects of social communication in Online Social Networks (OSNs). There are multiple methods for measuring sentiments, including lexical-based approaches and supervised machine learning methods. Despite the wide use and popularity of some methods, it is unclear which method is better for identifying the polarity (i.e., positive or negative) of a message as the current literature does not provide a method of comparison among existing methods. Such a comparison is crucial for understanding the potential limitations, advantages, and disadvantages of popular methods in analyzing the content of OSNs messages. Our study aims at filling this gap by presenting comparisons of eight popular sentiment analysis methods in terms of coverage (i.e., the fraction of messages whose sentiment is identified) and agreement (i.e., the fraction of identified sentiments that are in tune with ground truth). We develop a new method that combines existing approaches, providing the best coverage results and competitive agreement. We also present a free Web service called iFeel, which provides an open API for accessing and comparing results across different sentiment methods for a given text.

Authors (4)

Pollyanna Gonçalves (3 papers)
Fabrício Benevenuto (64 papers)
Meeyoung Cha (63 papers)
Matheus Araújo (2 papers)

Citations (421)

View on Semantic Scholar

Summary

An Analytical Overview of "Comparing and Combining Sentiment Analysis Methods"

The paper "Comparing and Combining Sentiment Analysis Methods" by Pollyanna Gonçalves et al. presents a detailed comparative paper of eight prevalent sentiment analysis techniques. The focal point of this research addresses the gap in existing literature regarding the effectiveness of sentiment methods in analyzing text from Online Social Networks (OSNs). By comparing methods in terms of coverage and agreement, the paper evaluates their applicability to different contexts.

Overview of Sentiment Analysis Methods

The paper explores two broad categories of sentiment analysis approaches: lexical-based and machine learning-based methods. Machine learning methods, despite offering adaptability through trained models, face limitations due to the necessity of labeled data. In contrast, lexical-based methods use predefined word lists linked to sentiments but struggle with linguistic variations such as slang, which are prevalent in OSNs.

Eight methods are rigorously analyzed: LIWC, Happiness Index, SentiWordNet, SASA, PANAS-t, Emoticons, SenticNet, and SentiStrength. Each method provides unique mechanisms and validations—ranging from psychometric scales like PANAS-t to semantic inference in SenticNet, and emotive representations in Emoticons—offering diverse perspectives in sentiment analysis.

Comparative Evaluation

The methodological framework involves the evaluation of sentiment methods using two datasets: a large Twitter log capturing public reactions to events and a collection of texts labeled by humans. The paper meticulously measures performance in terms of coverage—the fraction of content a method can analyze—and agreement with a known ground truth.

Key findings indicate a notable disparity in the coverage and agreement among different methods, highlighting their diverse levels of effectiveness. No single method consistently outperforms others across all scenarios. For instance, Emoticons exhibited high agreement rates but suffered from limited coverage. Conversely, methods like SenticNet and SentiWordNet demonstrated extensive coverage but variable agreement rates.

Development and Utility of Combined-Method

To address these discrepancies, the authors introduce a Combined-method, which amalgamates strengths from multiple existing methods to optimize coverage and maintain competitive agreement levels. This synthesis presents an innovative approach, suggesting a potential trajectory for future sentiment analysis research.

Practical Application: iFeel System

The research extends into the practical domain via the iFeel system, an open-access web service that allows for comparative analysis of sentiment methods on user-input text. This tool, excluding methods with proprietary constraints like LIWC, supports researchers and developers in selecting appropriate methods for specific tasks, illustrating a practical application of the theoretical framework presented in the paper.

Implications and Future Directions

The implications of this paper are substantial for both theory and practice. The analysis provides critical insights into the applicability of sentiment methods across various contexts and underscores the necessity for contextual adaptation in sentiment analysis tools. The paper advocates for the continued integration of diverse sentiment methods to enhance analytical reliability and suggests expansion into additional sentiment dimensions beyond polarity.

The paper's approach emphasizes the importance of harmonizing analytical methods with the dynamic and evolving language characteristics observed in OSNs. Future developments could explore augmenting Combined-method with more diverse datasets and fully integrated machine learning techniques, thus fortifying its robustness against rapidly changing linguistic patterns prevalent in social media discourse. Additionally, expanding the iFeel's capabilities to handle larger datasets could enhance its utility for real-time sentiment monitoring applications.

In conclusion, this detailed comparative evaluation offers a structured pathway for advancing sentiment analysis methodologies, reflecting a significant contribution to the domain of computational social science.

PDF Markdown