Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SemEval-2014 Task 9: Sentiment Analysis in Twitter (1912.02990v1)

Published 6 Dec 2019 in cs.CL, cs.IR, cs.LG, and cs.SI

Abstract: We describe the Sentiment Analysis in Twitter task, ran as part of SemEval-2014. It is a continuation of the last year's task that ran successfully as part of SemEval-2013. As in 2013, this was the most popular SemEval task; a total of 46 teams contributed 27 submissions for subtask A (21 teams) and 50 submissions for subtask B (44 teams). This year, we introduced three new test sets: (i) regular tweets, (ii) sarcastic tweets, and (iii) LiveJournal sentences. We further tested on (iv) 2013 tweets, and (v) 2013 SMS messages. The highest F1-score on (i) was achieved by NRC-Canada at 86.63 for subtask A and by TeamX at 70.96 for subtask B.

Citations (583)

Summary

  • The paper presents a comprehensive challenge on Twitter sentiment analysis, detailing task organization and participant outcomes.
  • It employs rigorous methodologies including Mechanical Turk annotation and classifiers like SVM, MaxEnt, and deep learning techniques.
  • Key insights highlight the superior performance of constrained models and the need for improved sarcasm detection and domain adaptation.

Sentiment Analysis in Twitter: Insights from SemEval-2014 Task 9

The paper details the organization and results of SemEval-2014 Task 9, which focused on sentiment analysis in social media contexts, specifically Twitter. Building on the previous year's task, it saw broad participation, engaging 46 teams across two subtasks. This task underscores the complexities and methodologies involved in sentiment analysis within informal communication channels, typified by social media.

Task Objectives and Structure

The task comprised two main subtasks:

  1. Subtask A: Contextual Polarity Disambiguation - Focused on determining the sentiment (positive, negative, neutral) associated with specific words or phrases within tweets.
  2. Subtask B: Message Polarity Classification - Aimed at classifying the sentiment conveyed by entire tweets.

The challenge incorporated new test sets, including datasets of sarcastic tweets and LiveJournal sentences, to evaluate cross-domain adaptability and the robustness of sentiment classification systems.

Dataset and Methodology

The datasets included diverse sources: previous Twitter and SMS datasets, augmented with new 2014 Twitter data, sarcastic tweets, and entries from LiveJournal. Annotation was achieved using Mechanical Turk, ensuring a rigorous standard by aggregating multiple annotations per instance.

Evaluation and Results

Performance was gauged using F1-scores for both subtasks across various datasets. The constrained system by NRC-Canada excelled in Subtask A with an F1-score of 86.63 on regular tweets. Notably, TeamX achieved the highest score for Subtask B on Twitter data, indicating effective sentiment trend detection.

Some key observations include:

  • Constrained vs. Unconstrained Systems: Despite the potential of unconstrained systems to leverage broader datasets, constrained systems generally performed better. This suggests potential mismatches in domain adaptation or underutilization of additional data resources.
  • Algorithmic Trends: Predominantly, participating teams utilized SVM, MaxEnt, and Naive Bayes classifiers. However, there was notable execution of deep learning techniques, particularly by the coooolll and ThinkPositive teams, indicating a shift towards more complex models.
  • Preprocessing Techniques: Prevalent strategies involved handling social media-specific nuances, such as emoticons and abbreviations. This preprocessing step remains crucial for effective sentiment classification in social media contexts.

Implications and Future Directions

The results from SemEval-2014 Task 9 substantiate several implications for future research:

  • Sarcasm Detection: Given the observed performance drop on sarcastic tweets, developing models to handle sarcasm is a desirable progression in sentiment analysis research.
  • Domain Adaptation: While systems were primarily tuned for Twitter, there is a marked interest in devising methodologies applicable across various informal text domains.
  • Advanced Models: The onset of deep learning approaches marks a paradigm shift, warranting further exploration into hybrid and ensemble methods to leverage diverse data sources more effectively.

The paper concludes with intentions to continue this line of inquiry in subsequent SemEval challenges, potentially expanding focus areas and refining evaluation metrics to better capture sentiment trends and improve detection levels across domains. This ongoing work holds promise for enhancing automated sentiment analysis capabilities within ever-evolving online communication frameworks.