Sentiment Lexicons: Methods & Uses

Updated 10 September 2025

Sentiment lexicons are curated resources that assign sentiment orientations to words through methods like manual annotation, corpus analysis, and machine learning.
They support a wide range of applications from basic polarity classification to domain-specific opinion mining and multilingual NLP.
Integration of lexicons as features in machine learning and ensemble methods enhances model explainability, coverage, and classification accuracy.

A sentiment lexicon is a curated resource that assigns sentiment orientation—such as positive, negative, or emotion categories—to words, phrases, or concepts. These lexicons underpin a substantial fraction of sentiment analysis workflows, ranging from basic polarity classification to domain-specific opinion mining and multilingual natural language processing. Lexicon creation leverages human annotation, bootstrapping algorithms, statistical corpus analysis, graph-theoretic propagation, and, increasingly, neural embedding and model-interpretability frameworks. Their structure, coverage, construction methodologies, and integration strategies profoundly influence the efficacy and adaptability of sentiment-oriented text analytics.

1. Construction Methodologies of Sentiment Lexicons

Multiple methodologies exist for constructing sentiment lexicons, often determined by the intended language, domain specificity, and required coverage:

Manual Annotation: Human experts assign sentiment labels (sometimes along a continuous scale) to candidate words, often using context-rich examples to disambiguate. The "Economic Lexicon (EL)" constructs a high-precision resource by extracting modifiers via dependency parsing from economic corpora and assigning sentiment scores in $[-1,1]$ through 10 annotators per term (Barbaglia et al., 21 Nov 2024). Similarly, the NRC Emotion Lexicon assigns binary associations for words across emotion and sentiment categories.
Corpus-Based and Seed-Bootstrapping Approaches: These methods automatically expand a small seed set of manually annotated words. The NRC Hashtag Sentiment Lexicon and the Sentiment140 Lexicon employ pointwise mutual information (PMI) to associate terms with positive or negative sentiment using weakly labeled corpora (e.g., tweets with sentiment word hashtags or emoticons): $\text{score}(w) = PMI(w, \text{positive}) - PMI(w, \text{negative})$ resulting in large-coverage lexicons over unigrams, bigrams, and noncontiguous n-grams (Mohammad et al., 2013).
Dictionary/Graph-Based Propagation: Lexicons such as LexiPers leverage synonymy, antonymy, or hypernym/hyponym relations within lexical ontologies (e.g., WordNet, GermaNet, or FarsNet). Annotated seeds propagate polarity through strongly connected components, employing dependency constraints or bootstrapped expansion, with optional refinement using PMI statistics (Sabeti et al., 2019).
Translation and Cross-Lingual Embeddings: For low-resource target languages, automatic translation of established sentiment lexicons can bootstrap coverage. Methods based on bilingual word embeddings map source-language vectors to target-language space via a linear transformation $W$ :

$y \approx Wx, \quad \min_W \sum_i \|Wx_i - y_i\|^2$

and then assign translations using cosine proximity (Rouvier et al., 2016). Manual correction is often required for lexical idiosyncrasies (e.g., IgboSentilex (Ogbuju et al., 2020); Multilingual expansions with language-specific sentiment adjustment (Malinga et al., 6 Nov 2024)).

Machine Learning and Model-Based Extraction: Recent advances obtain lexicons by interpreting large neural models. XLex, for example, uses transformers fine-tuned on labeled data and SHapley Additive exPlanations (SHAP) to extract words assigned robustly positive or negative contributions in the model (Rizinski et al., 2023). Crowdsourcing can be used for direct annotation at scale, especially for “pure emotion” lexicons (Haralabopoulos et al., 2017).

2. Coverage, Structure, and Scoring Paradigms

Lexicons vary substantially by their unit of analysis (word, phrase, multiword expression), coverage, and sentiment coding scheme:

Coverage: Ranges from low (narrow domain, manually annotated; e.g., Loughran–McDonald for finance), to moderate (EL: 6,670 terms for economics (Barbaglia et al., 21 Nov 2024)), to high (SentiWords: $\sim$ 155,000 lemma#PoS entries (Gatti et al., 2015)), and even multilingual resource spanning major and low-resource languages (Malinga et al., 6 Nov 2024).
Granularity of Annotation:
- Binary/Categorical: Positive/negative/neutral labeling.
- Fine-Grained or Continuous: Scores in $[-1,1]$ (EL, SentiWords).
- Multidimensional: NRC Emotion Lexicon and crowdsourced emotion lexicons encode presence/absence of eight Plutchik emotions and polarity (Bellot et al., 2020, Haralabopoulos et al., 2017); economic lexica offer continuous scales.
Lexicon Table Structure (as seen in result sections):

Word	Anticipation	Anger	Sadness	Joy	...	Positive	Negative
abandon	0	0	1	0	...	0	1

A large lexicon allows improved coverage of real-world datasets (e.g., SentiWords’ $\sim$ 90% coverage on SemEval news headlines vs. much lower for GI/ANEW (Gatti et al., 2015)).

Language and Domain Specificity: Domain-specific lexicons (e.g., Senti-DD for finance (Park et al., 2021), domain-adapted community or historical lexicons (Hamilton et al., 2016), or multilingual expansions (Malinga et al., 6 Nov 2024)) capture shifting sentiment semantics across communities and time, detecting polarity reversals and context-driven sentiment changes (Wang et al., 2020).

3. Integration of Lexicons into Sentiment Analysis Systems

Feature Engineering: Lexicon scores serve as direct features or aggregation statistics in machine learning models (summed, counted, max, or contextually partitioned) as in the NRC-Canada SVM system (Mohammad et al., 2013). Integration with other features (ngrams, stylistic, POS-specific features) and operation across contexts (e.g., hashtags or negated forms: appending "_NEG") is standard.
Hybrid and Ensemble Methods: Lexicon features are often incorporated alongside or as input to supervised classifiers (SVM, RF, logistic regression, XGBoost) (Kolchyna et al., 2015, Raees et al., 19 Sep 2024), and, in ensemble configurations, can boost performance beyond lexicon-only or model-only baselines (e.g., lexicon score as a feature yields measurable F1 improvements (Kolchyna et al., 2015)).
Handling Class Imbalance: Lexicon-based signals can interact with cost-sensitive classifiers (e.g., cost-adjusted SVM with class-weighted misclassification penalties) to improve minority class detection (up to 7% F1 gain in Twitter datasets (Kolchyna et al., 2015)).
Explainability and Real-Time Applications: Methods like XLex leverage model-derived word importances to build interpretable lexicons for real-time decision support, with speed-ups up to $\sim$ 87 $\times$ compared to transformers on commodity hardware (Rizinski et al., 2023).

4. Adaptation Across Languages, Domains, and Time

Domain Adaptation and Context Awareness: Context-dependent polarity is a key challenge. Senti-DD encodes direction-dependent sentiment shifts (e.g., "profit increases" positive, "profit decreases" negative) by constructing word pairs and quantifying context using PMI (Park et al., 2021). DSG graph-based inference detects domain polarity changes via Markov Random Fields on word co-occurrence graphs, adjusting prior lexicon labels for in-domain accuracy (Wang et al., 2020).
Multilingual and Low-Resource Lexicons: Robust pipelines combine translation (Google API, LLMs), cross-lingual embedding mapping, and manual curation, assigning language-specific sentiment scores to preserve cultural/linguistic nuance, as in (Malinga et al., 6 Nov 2024) for African LRLs, with machine learning and contextual BERT/XAI integration for model transparency and accuracy.
Historical and Community-Specific Shifts: Temporal induction frameworks demonstrate that more than 5% of sentiment-bearing words can reverse polarity over 150 years (amelioration/pejoration) (Hamilton et al., 2016). Community-specific sentiment lexicons yield nuanced detectability of sentiment variation (e.g., Reddit subreddits).

5. Evaluation, Empirical Impact, and Limitations

Empirical Validation: Evaluation metrics include accuracy, macro/micro-averaged F1, mean absolute error (MAE), and coverage rates. For example, SentiWords using an ensemble regression achieves significantly lower MAE and higher correlation than single-formula SWN metrics (Gatti et al., 2015). Application-specific studies show that including emotion categories alongside polarity enhances classification precision and recall, as in Amazon book review analysis (Bellot et al., 2020).
Model Performance: On large-scale datasets (e.g., Twitter: 1.6M tweets (Raees et al., 19 Sep 2024)), lexicon-based preprocessing and feature selection drive classification accuracy, e.g., Random Forest achieving $\sim$ 81%. In personality assessment on social profiles, lexicon-derived sentiment scores accurately distinguish evaluative language (Raees et al., 19 Sep 2024).
Coverage/Noise Trade-Off: Larger lexicons increase recall but risk ambiguity and noise (e.g., OL + EMO + AUTO may decrease performance vs. curated OL + EMO (Kolchyna et al., 2015)), supporting the principle that lexicon quality, not just size, determines impact.
Resource Intensiveness and Cross-Domain Robustness: Human annotation yields high precision but is laborious. Automatically derived or translated lexicons extend reach but require strategies (embedding alignment, graph propagation, XAI) to maintain quality in new languages or domains. Resource intensity is a limiting factor for continuous reannotation (e.g., EL human curation (Barbaglia et al., 21 Nov 2024)).
Interpretability and Model Interaction: While deep models like BERT substantially outperform classical rule-based/lexicon methods in macro F1 (demonstrated with up to 27 percentage point gains (Razova et al., 2021)), attention analysis reveals that 75% of attention heads focus statistically more on sentiment lexicon tokens, corroborating the enduring role of lexicons even in neural models. Model-explainable lexicons such as XLex reconcile real-time interpretability and performance (Rizinski et al., 2023).

6. Emerging Directions and Applications

Aspect and Contextual Extensions: Aspect-based and target-sensitive sentiment lexicons (e.g., context-tagged, direction-dependent word pairs (Park et al., 2021)) represent an important growth area to address compositional semantics.
Crowdsourcing and Emotion Diversity: Scalable crowdsourced workflows now produce rich emotion lexicons with dyads (beyond polarity), capturing nuanced affective expression and modifier phenomena (e.g., intensification, negation, multi-emotion terms (Haralabopoulos et al., 2017)).
Domain-Specific and Multilingual Expansion: There is an ongoing trend and need to extend coverage to low-resource languages and specialized domains (medicine, finance, economics, governance), using translation, embedding transfer, and model explainability to ensure culturally and contextually relevant sentiment annotation (Malinga et al., 6 Nov 2024, Barbaglia et al., 21 Nov 2024).
Evaluation and Calibration: Future advances may focus on dynamic or aspect-based lexicons, continuous reannotation strategies, and improved sentence-level calibration, using gold-standard datasets for fine-tuning and adversarial domain adaptation (Barbaglia et al., 21 Nov 2024).

The construction, integration, and continuous adaptation of sentiment lexicons remain central to state-of-the-art sentiment analysis. Advances in lexicon-induction algorithms, cross-lingual transfer, and explainable model extraction have significantly expanded their utility, while ongoing research addresses context sensitivity, cultural adaptability, and the pragmatic trade-off between coverage, interpretability, and empirical precision across languages and domains.