SemEval-2013 Task 2: Sentiment Analysis in Twitter (1912.06806v1)

Published 14 Dec 2019 in cs.CL, cs.IR, and cs.LG

Abstract: In recent years, sentiment analysis in social media has attracted a lot of research interest and has been used for a number of applications. Unfortunately, research has been hindered by the lack of suitable datasets, complicating the comparison between approaches. To address this issue, we have proposed SemEval-2013 Task 2: Sentiment Analysis in Twitter, which included two subtasks: A, an expression-level subtask, and B, a message-level subtask. We used crowdsourcing on Amazon Mechanical Turk to label a large Twitter training dataset along with additional test sets of Twitter and SMS messages for both subtasks. All datasets used in the evaluation are released to the research community. The task attracted significant interest and a total of 149 submissions from 44 teams. The best-performing team achieved an F1 of 88.9% and 69% for subtasks A and B, respectively.

Citations (379)

View on Semantic Scholar

Summary

The paper defines and evaluates sentiment analysis methods for short, informal text like Twitter and SMS through two subtasks: phrase-level contextual polarity and message-level polarity classification.
It details the creation and annotation process for robust Twitter datasets used in the competition, which were subsequently released to the research community.
The best system achieved F1-scores of 88.9% for phrase-level and 69% for message-level tasks, highlighting the difficulty and contributing valuable benchmark results for the field.

Insights into SemEval-2013 Task 2: Sentiment Analysis in Twitter

The paper "SemEval-2013 Task 2: Sentiment Analysis in Twitter" details a comparative analysis of sentiment analysis techniques aimed at microblogging platforms, with a particular focus on Twitter and SMS messages. This task, embedded within the SemEval series of semantic evaluation workshops, provided a unique opportunity to address sentiment analysis within informal, concise text formats which present inherent challenges for NLP.

Task Breakdown and Objectives

The task was subdivided into two key subtasks:

Subtask A: Contextual Polarity Disambiguation - This involved determining the polarity (positive, negative, or neutral) of a marked phrase within the context of a message.
Subtask B: Message Polarity Classification - For this subtask, participants were required to determine the sentiment of an entire message, choosing the strongest sentiment if a message conveyed both positive and negative sentiments.

These frameworks aimed to illuminate differing aspects of sentiment polarity, requiring approaches adept in both detailed contextual understanding and message-wide sentiment inference.

Data Collection and Annotation

A significant contribution of this research was the development of robust annotated datasets. Twitter data was the cornerstone, due to the platform's widespread use for personal expression. The collection process entailed filtering Twitter messages based on entities and sentiment lexicons from SentiWordNet, while the annotation leveraged Amazon Mechanical Turk to label sentiment expressions at phrase and message levels. These datasets, marked by positive, negative, or neutral sentiments, were subsequently released to the research community, fostering future studies.

Participant Overview and System Designs

The task attracted substantial engagement with 149 submissions from 44 teams. Participants developed both constrained and unconstrained systems. Constrained systems were reliant solely on the provided dataset, though external resources such as sentiment lexicons were permissible. Unconstrained systems, conversely, could utilize additional datasets for system training and development.

Predominantly, supervised learning strategies were employed, with classifiers such as Support Vector Machines (SVM) and Naive Bayes being prevalent. In addition to standard word-related features, systems frequently incorporated sentiment lexicons and Twitter-specific features such as hashtags and emoticons, highlighting diverse feature engineering methodologies.

Results and Implications

Impressively, the best-performing system achieved F1-scores of 88.9% and 69% for the phrase-level and message-level subtasks respectively, underscoring the task's difficulty gradient between granular phrase analysis and broader message sentiment determination.

The implications of this work are profound, illustrating the significant leap in sentiment analysis capable when datasets are robust and diverse. The release of these datasets under a Creative Commons license further exemplifies a commitment to advancing the field.

Conclusion and Future Directions

This paper's contribution lies not only in its analysis of sentiment within social media datasets but also in setting a robust precedent for subsequent sentiment analysis competitions. Future research could benefit from this dataset by exploring advancements in deep learning and transfer learning, potentially overcoming the challenges posed by informal text. Such exploration may align with the continued evolution of consumer-generated content across new social media platforms, thus maintaining the relevance and utility of this foundational work. The landscape of sentiment analysis in microblogging is ripe for further innovation, with SemEval-2013 Task 2 serving as a pivotal stepping stone.

PDF Markdown