Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 65 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 164 tok/s Pro
GPT OSS 120B 466 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

A new ANEW: Evaluation of a word list for sentiment analysis in microblogs (1103.2903v1)

Published 15 Mar 2011 in cs.IR and cs.CL

Abstract: Sentiment analysis of microblogs such as Twitter has recently gained a fair amount of attention. One of the simplest sentiment analysis approaches compares the words of a posting against a labeled word list, where each word has been scored for valence, -- a 'sentiment lexicon' or 'affective word lists'. There exist several affective word lists, e.g., ANEW (Affective Norms for English Words) developed before the advent of microblogging and sentiment analysis. I wanted to examine how well ANEW and other word lists performs for the detection of sentiment strength in microblog posts in comparison with a new word list specifically constructed for microblogs. I used manually labeled postings from Twitter scored for sentiment. Using a simple word matching I show that the new word list may perform better than ANEW, though not as good as the more elaborate approach found in SentiStrength.

Citations (1,275)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces the AFINN lexicon, tailored for microblog sentiment analysis by incorporating contemporary internet slang.
  • The paper evaluates AFINN using AMT-labeled tweets, showing a higher Pearson correlation (0.564) compared to ANEW.
  • The paper discusses traditional lexicon limitations and suggests integrating features like negation handling for future improvements.

Evaluation of A New Sentiment Word List for Microblog Analysis

The paper "A new ANEW: Evaluation of a word list for sentiment analysis in microblogs" by Finn Årup Nielsen presents an empirical evaluation of sentiment word lists focused on their applicability to microblog data, specifically Twitter. It introduces and benchmarks a newly constructed lexicon, referred to as the AFINN lexicon, against established lists such as ANEW (Affective Norms for English Words), General Inquirer, OpinionFinder, and the sentiment analysis tool SentiStrength.

Background and Motivation

Sentiment analysis has evolved as a critical component in NLP, particularly for the real-time assessment of public opinion on social media platforms. Traditional methods leverage either supervised machine learning models trained on labeled datasets or rule-based systems utilizing sentiment lexicons. The latter are collections of words annotated with sentiment strength scores, used to determine the emotional valence of texts.

However, word lists like ANEW were developed before the proliferation of microblogging, potentially limiting their effectiveness due to the absence of contemporary internet slang and colloquial expressions prevalent on platforms such as Twitter. This research assesses whether a new, tailored lexicon can outperform these conventional lists in sentiment evaluation tasks.

Lexicon Construction

The AFINN lexicon was initiated in 2009 to monitor sentiment during the United Nations Climate Conference (COP15). The initial list of 1,468 words has expanded to 2,477 unique entries, including frequently used internet slang and strong obscene words. Each word was manually assigned a sentiment score between -5 (very negative) and +5 (very positive), excluding other dimensions like arousal and dominance to streamline the labeling process. Words were sourced from previous lexicons, internet slang dictionaries, urban dictionaries, and context analysis of Twitter data.

Methodology

To evaluate the new lexicon's performance, the paper utilized a labeled dataset of 1,000 tweets sourced from Amazon Mechanical Turk (AMT). Each tweet's sentiment score, averaged from ten separate annotations, served as the ground truth. The analysis employed various sentiment scoring methods, including sum-based normalization schemes and comparisons against both ANEW and other lexicons.

Pearson and Spearman correlations between lexicon-derived sentiment scores and AMT labels were computed to quantify performance. The paper also explored the impact of lexicon intersection, where words common to both AFINN and ANEW were used to isolate the effects of lexicon size and individual word scoring.

Results

The findings indicate that the AFINN lexicon yields a higher Pearson correlation (0.564) with AMT labels compared to ANEW (0.525), as well as better alignment in Spearman's rank correlation. Though boasting a larger vocabulary, General Inquirer and OpinionFinder underperformed relative to both AFINN and ANEW, likely due to their focus on polarity rather than sentiment strength. The SentiStrength tool, employing advanced features such as negation detection and emoticon handling, achieved the highest correlation (0.610).

The research highlights the importance of contextual relevance in lexicons, demonstrating that the inclusion of internet-specific vernacular can enhance performance. However, it also notes that ANEW’s psycholinguistic validation still renders it more suitable for scientific studies outside the microblogging context.

Discussion and Future Work

The investigation underscores the nuanced requirements for effective sentiment analysis in microblog data. While the AFINN lexicon demonstrated improved performance, it did not completely surpass more sophisticated tools like SentiStrength. Future research could integrate additional computational techniques such as handling negation, emoticons, and contextual variations to enhance performance further.

Furthermore, the evolution of performance with increasing lexicon size suggests that continued expansion and refinement could yield incremental improvements. Future directions could also include automated methods for dynamic lexicon expansion grounded in real-time social media data streams.

Conclusion

Nielsen's paper provides a valuable contribution to the field of sentiment analysis in microblogs by presenting and evaluating a lexicon specifically tailored for this context. The findings highlight the potential benefits of incorporating contemporary internet language into sentiment lexicons and pave the way for further advancements in real-world sentiment analysis applications.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)