SemEval-2017 Task 4: Sentiment Analysis in Twitter (1912.00741v1)

Published 2 Dec 2019 in cs.CL, cs.IR, and cs.LG

Abstract: This paper describes the fifth year of the Sentiment Analysis in Twitter task. SemEval-2017 Task 4 continues with a rerun of the subtasks of SemEval-2016 Task 4, which include identifying the overall sentiment of the tweet, sentiment towards a topic with classification on a two-point and on a five-point ordinal scale, and quantification of the distribution of sentiment towards a topic across a number of tweets: again on a two-point and on a five-point ordinal scale. Compared to 2016, we made two changes: (i) we introduced a new language, Arabic, for all subtasks, and (ii)~we made available information from the profiles of the Twitter users who posted the target tweets. The task continues to be very popular, with a total of 48 teams participating this year.

Citations (759)

View on Semantic Scholar

Summary

The paper presents a comprehensive framework for Twitter sentiment analysis through five subtasks, covering both classification and sentiment quantification.
It introduces innovations by incorporating Arabic language data and user profile information to enhance context-aware sentiment models.
Evaluations show that deep learning methods like CNNs and LSTMs effectively manage the complexities of social media text analysis.

Sentiment Analysis in Twitter: An Overview of SemEval-2017 Task 4

The SemEval-2017 Task 4 focuses on the important and challenging problem of sentiment analysis in Twitter. Since its inception, this task has drawn significant attention due to the exponential growth of social media and the consequent large volume of user-generated content that requires analysis. Participating for the fifth consecutive year, in 2017, the task encompassed five subtasks, each targeting specific dimensions of sentiment analysis, and introduced notable innovations that enhanced its overall objectives.

Task Setup and Subtasks

SemEval-2017 Task 4 comprised five distinct subtasks, executed for both English and Arabic languages. These subtasks aimed to provide comprehensive coverage of sentiment analysis problems on Twitter:

Subtask A: Given a tweet, classify its overall sentiment as Positive, Negative, or Neutral.
Subtask B: Given a tweet and a topic, classify the sentiment towards that topic on a binary scale: Positive vs. Negative.
Subtask C: Given a tweet and a topic, classify the sentiment towards that topic on a five-point scale: StronglyPositive, WeaklyPositive, Neutral, WeaklyNegative, and StronglyNegative.
Subtask D: Given a set of tweets about a topic, estimate the distribution of tweets into Positive and Negative classes.
Subtask E: Given a set of tweets about a topic, estimate the distribution of tweets across five sentiment classes: StronglyPositive, WeaklyPositive, Neutral, WeaklyNegative, and StronglyNegative.

These subtasks extended beyond mere classification to involve sentiment quantification and ordinal regression, reflecting real-world applications where aggregate sentiment statistics are often more relevant than individual tweet sentiments.

Innovations in 2017

Introduction of Arabic Language

In 2017, Task 4 introduced the Arabic language for all subtasks. This addition aimed to foster multilingual and cross-lingual approaches in sentiment analysis and expand the resources for Arabic sentiment analysis. The adoption of Arabic posed unique challenges due to its rich morphology and the prevalence of dialectal variations on Twitter. These complexities were addressed by developing a new Arabic Twitter dataset annotated for sentiment.

Use of User Information

Another significant innovation in 2017 was the availability of user profile information. Participants were encouraged to leverage demographic data retrieved from the users' public Twitter profiles, such as age, location, and sentiment of tweets by friends. The integration of this data aimed at enhancing sentiment analysis models by incorporating user-specific context.

Data Annotation and Evaluation

The datasets for task evaluation comprised annotated tweets where multiple annotators assigned sentiment labels. The annotation process included stringent quality controls and consolidation mechanisms to ensure high-quality labels. The evaluation measures varied across subtasks to reflect their specific nature:

Subtask A: Evaluated using average recall (AvgRec), accuracy, and $F_1$ .
Subtask B: Primary measure AvgRec, supplemented by accuracy and $F_1$ .
Subtask C: Evaluated using macro-average mean absolute error ( $MAE^{M}$ ) and mean absolute error ( $MAE^{\mu}$ ).
Subtask D and E: Relied upon quantification error measures such as Kullback-Leibler Divergence (KLD), Absolute Error (AE), and Earth Mover's Distance (EMD).

Results and Reflections

Participation in 2017 was robust with 48 teams across different subtasks. The most popular was Subtask A, although there was significant participation in topic-specific subtasks as well. Methods employing deep learning, especially Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs), dominated the top-performing systems. For instance, the top-ranked teams in Subtasks A through E often featured deep learning models enhanced with ensemble approaches or specialized attention mechanisms.

In the case of the Arabic language subtasks, despite the nascent stage of resources, participants managed to achieve competitive performances, underscoring the effectiveness of the newly introduced multilingual tracks.

Implications and Future Work

The innovations and results from SemEval-2017 Task 4 have considerable implications for sentiment analysis research and applications. Introducing a new language and encouraging the use of user information have opened avenues for richer and more context-aware sentiment models. Practically, improved sentiment analysis tools hold promise for social media monitoring, market analysis, and sociopolitical sentiment tracking.

Future directions may include expanding to more languages, incorporating deeper user profile analysis, and integrating conversational context in sentiment analysis models. There is also a potential to explore the intersections of sentiment analysis with other subtasks such as irony detection and emotion recognition, fostering holistic and nuanced sentiment insights.

Overall, SemEval-2017 Task 4 has substantially contributed to the field by providing extensive datasets, standardized evaluation protocols, and fostering innovation through collaborative challenges.

PDF Markdown