AffectiveTweets Regression System

Updated 5 January 2026

AffectiveTweets Regression System is a framework that estimates continuous emotion intensity in tweets using innovative machine learning and regression approaches.
It integrates diverse features including lexica, n-grams, word embeddings, deep neural representations, and stylometric markers to capture affect nuances.
Evaluation via Pearson correlation on SemEval benchmarks demonstrates its effectiveness for applications in social media analysis and computational psychology.

The AffectiveTweets Regression System encompasses a suite of methodologies for estimating the real-valued intensity of emotions conveyed in tweets, with primary application to data produced for tasks such as the SemEval "Affect in Tweets" challenge. Central to these systems are machine learning pipelines that integrate extensive, heterogeneous feature engineering—including affect lexica, dense word embeddings, deep neural representations, and tweet-specific stylometric markers—and sophisticated regression/classification frameworks, such as L2-regularized SVR, ensemble methods, and mixture-of-experts architectures. The accuracy of such systems is assessed in terms of Pearson correlation (and related metrics) between predicted and gold-standard fine-grained emotion scores, which are typically derived via rigorous annotation schemes like best–worst scaling.

1. Problem Definition and Data Foundation

The core task addressed by AffectiveTweets Regression Systems is the mapping of a tweet $x$ to a continuous-valued emotion intensity $y \in [0,1]$ , either for basic emotions (anger, fear, joy, sadness) or valence, with models trained to minimize squared prediction error. Datasets are generated using emotion-focused sampling, yielding corpora in the order of $\sim$ 7,000 English tweets distributed among these emotions, with careful control over lexical and author overlap. Annotation employs best–worst scaling (BWS), where tweets are presented in maximally diverse 4-tuples to annotators. Annotator judgments are converted to $y_i = \frac{s_i+1}{2}$ where $s_i=(b_i-w_i)/(b_i+w_i)$ . Reliability estimates report split-half Pearson $r$ between 0.80 and 0.88 depending on emotion (Mohammad et al., 2017).

2. Feature Engineering Paradigms

AffectiveTweets systems extract and fuse diverse feature sets, optimized for lexical, semantic, and syntactic coverage:

N-gram Features: Tokenized tweet representations include binary indicators for word or character n-grams (unigram up to 4-gram for words; char 3–5-grams), with negations systematically marked.
Word Embeddings: Dense representations, most notably 400-dim skip-gram word2vec (trained on 10M Twitter messages) or 300-dim GloVe vectors, provide aggregated semantic profiles for each tweet via element-wise averaging.
Affect Lexicon Features: Scores from up to ten sentiment/affect lexica—e.g., AFINN, Bing Liu, MPQA, NRC-Affect-Intensity, NRC-EmoLex, NRC Hashtag Sentiment—are summed or counted per category for each tweet.
Deep-Emoji and Neural Features: Pretrained semantically-rich embeddings (attention and softmax activations from Deep-Emoji, skip-thought 4,800-dim sentence vectors, and the 4,096-dim unsupervised sentiment neuron) capture compositional and emotive context, especially for short informal text.
Stylometric Features: Counts of emoticons, part-of-speech classes, punctuation, average word length, and related measures further characterize tweet style and informality.
Hashtag Intensity: Mean intensities of hashtagged emotion words (from Depeche Mood dictionary) quantify implicit self-annotation.

All features are concatenated into a high-dimensional tweet vector ( $x \in \mathbb{R}^D$ , $D\sim10,000$ ), standardized, and input to regression/classification components (Oota et al., 2019, Mohammad et al., 2017).

3. Modeling and Learning Architectures

Single Model Regression

Early reference implementations employ L2-regularized L2-loss SVR (LIBLINEAR), optimizing

$\min_w \frac{1}{2} \|w\|_2^2 + C \sum_{i=1}^M (w^\top \phi(x_i) - y_i)^2$

with $C=1$ tuned to maximize held-out Pearson $r$ (Mohammad et al., 2017). Alternative baselines include unigrams, n-grams, or embeddings as standalone feature sets.

Mixture-of-Experts ("Experts Model")

Recent advances center on a Mixture-of-Experts (MoE) ensemble, where $K$ experts $f_k(x;\theta_k)$ (e.g., Gradient Boosting, XGBoost, LightGBM, Random Forest, shallow neural network) are independently pre-trained. An $x$ -conditioned gating network (parameterized by $w_k, b_k$ ) outputs softmax-normalized weights $g_k(x)$ :

$g_k(x) = \frac{\exp(w_k^\top x + b_k)}{\sum_{j=1}^K \exp(w_j^\top x + b_j)}$

Final predictions are a convex combination:

$\widehat{y}(x) = \sum_{k=1}^K g_k(x) f_k(x;\theta_k)$

Gating parameters are optimized (with expert weights fixed) to minimize the expectation of a per-sample error:

$E = \sum_{k=1}^K \frac12\,g_k(x)\,\bigl(y - \widehat y_k\bigr)^2$

Training employs gradient descent; stratified cross-validation and grid search are used for expert hyperparameters (Oota et al., 2019).

4. Evaluation Procedures and Benchmarking

For regression and ordinal classification tasks, Pearson’s correlation $r$ is the principal evaluation metric, with both full-range (all $y\in[0,1]$ ) and moderate-to-high intensity ( $y\geq0.5$ ) performance monitored. The methodology is grounded in the SemEval-2018 "Affect in Tweets" challenge:

Subtasks: EI-reg (regression for emotion intensity), EI-oc (ordinal), V-reg (valence regression), V-oc (valence ordinal), E-c (multi-label emotion classification).
Dataset Splits: Thousands of tweets per emotion, with independent train, dev, and test splits. Each tweet independently scored for each emotion.
Baselines and Comparisons: Official SVM-unigram baselines serve as reference; the Experts Model consistently outperforms by 20–30 correlation points on regression, and 10–15 points in accuracy/F1 for classification (Oota et al., 2019).

Subtask	Experts Model Score	Top Performer	Baseline
EI-reg (macro $r$ )	0.753 (5th/48)	0.799	approx. 0.52
EI-oc (macro $r$ )	0.636 (5th)	0.695	approx. 0.47
V-reg	0.830 (7th)	–	–
V-oc	0.738 (10th)	–	–
E-c Jaccard	0.578 (3rd)	–	–

Performance for individual features: Deep-Emoji features yield $r\approx0.65$ for anger/fear, skip-thoughts and affect lexica follow, with stylometric and “unsupervised sentiment neuron” features least predictive ( $r\sim0.25$ –$0.30$).

5. Ablation, Analysis, and Linguistic Insights

Feature ablation and analytical studies indicate that:

Affect lexicon features confer the largest single boost ( $\Delta r \approx 0.13$ over embeddings alone).
Embeddings and lexicon features are synergistic: WE+L achieves $r\sim0.66$ average, compared to $\sim0.55$ or lower for individual features (Mohammad et al., 2017).
Removing n-grams has minimal effect once both lexicons and embeddings are present.
Hashtag analysis: trailing emotion hashtags increase perceived intensity in 78.6% of instances (mean intensity with hashtag 0.58 vs. 0.47 without; Wilcoxon $p<0.05$ ); the impact is generally positive but context-dependent.
Cross-emotion transfer: models trained on negative emotions generalize better between themselves, whereas negative-to-positive transfer (e.g., anger $\to$ joy) yields negative or near-zero Pearson $r$ (Mohammad et al., 2017).

6. Systemic Impact and Applications

AffectiveTweets Regression Systems have been extensively evaluated in shared tasks (SemEval-2018 Task 1, etc.), setting a de facto benchmark for emotion intensity prediction in microtext. Incorporation of pretrained deep semantic features alongside domain-specific lexica enables these systems to capture nuanced, fine-grained affect expressions otherwise inaccessible to traditional sentiment analysis. They support applications in social media mining, computational psychology, e-retail, and market analytics, where precise quantification of emotion intensity is essential for downstream reasoning, tracking affect-laden trends, and human-centric decision support (Mohammad et al., 2017, Oota et al., 2019).

A plausible implication is that as these models further integrate contextual, temporally-aware, and user-specific features, performance on both cross-domain and longitudinal emotion prediction tasks may continue to improve. The consensus finding from ablation and transfer experiments is that affect-rich lexica remain foundational, even as neural representations mature.

PDF Markdown Chat (Pro)

References (2)

Emotion Intensities in Tweets (2017)

Affect in Tweets Using Experts Model (2019)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to AffectiveTweets Regression System.