Emotional Compositionality in MWEs

Updated 3 December 2025

The paper introduces a framework comparing human-rated VAD scores with compositional estimates to predict the emotionality of multiword expressions.
It quantifies non-compositional deviations using Pearson correlations and RMSE, highlighting variations across noun compounds, idioms, and verb–particle constructions.
Implications for NLP include adopting hybrid sentiment analysis strategies that combine compositional predictions with direct MWE-level emotional ratings.

Emotional compositionality in multiword expressions (MWEs) refers to the extent to which the emotion-laden meaning of an expression can be predicted from the emotions associated with its constituent words. In the context of the NRC VAD Lexicon v2, this phenomenon is analyzed using the primary affective dimensions of valence (V), arousal (A), and dominance (D), based on large-scale human judgments. MWEs—spanning noun compounds, idioms, and verb–particle constructions—differ substantially in the degree to which their emotional impact is compositionally derived versus arising from non-compositional, often idiomatic, mechanisms. Understanding this compositionality is essential for NLP tasks such as sentiment analysis and emotion detection.

1. Resource Overview and Annotation Protocol

The NRC VAD Lexicon v2 provides human ratings for 10,073 English MWEs (predominantly bigrams) and 44,928 unigrams, resulting in a total of 55,133 annotated entries. MWE types include noun compounds (48%), fixed/idiomatic expressions (29%), and verb–particle constructions (23%), determined using concreteness-norms metadata. Each term—whether unigram or MWE—was rated on the VAD dimensions via a 7-point Likert scale (–3 representing “very negative/inactive/submissive” to +3 for “very positive/active/dominant”), with scores rescaled to [–1, +1] and averaged across approximately 8 annotators per term (V = 7.83, A = 7.96, D = 8.06).

Quality control employed “gold” questions, and split-half reliability analyses showed high consistency (Valence: Spearman’s $\rho=0.98$ , Pearson’s $r=0.99$ ; Arousal: $\rho=0.97$ , $r=0.98$ ; Dominance: $\rho=0.96$ , $r=0.96$ ) (Mohammad, 25 Nov 2025).

2. Operational Definition of Emotional Compositionality

The emotional compositionality of MWEs is formalized as the agreement between a human-rated VAD score for an expression and a compositional prediction based on its constituents. Specifically, for a two-word MWE $e = w_1\,w_2$ : $\hat V(e) = \frac{V(w_1) + V(w_2)}{2},\quad \hat A(e) = \frac{A(w_1) + A(w_2)}{2},\quad \hat D(e) = \frac{D(w_1) + D(w_2)}{2}$ The (absolute) non-compositional deviation for each dimension is: $\Delta X(e) = |X(e) - \hat X(e)|,\quad X \in \{V, A, D\}$ Aggregate fit is quantified via Pearson correlation $r_X$ and root mean square error (RMSE): $r_X = \frac{ \sum_{i} (X(e_i) - \overline{X(e)}) (\hat X(e_i) - \overline{\hat X(e)}) }{ \sqrt{\sum_{i}(X(e_i) - \overline{X(e)})^2 \sum_{i}(\hat X(e_i) - \overline{\hat X(e)})^2} }$

$\mathrm{RMSE}_X = \sqrt{ \frac{1}{N} \sum_{i=1}^N \bigl(X(e_i) - \hat X(e_i)\bigr)^2 }$

This framework enables quantitative assessment of how well constituent emotions predict the MWE’s affect (Mohammad, 25 Nov 2025).

3. Experimental Methodology

For empirical evaluation, 8,330 bigram MWEs were binned into $7 \times 7$ classes according to constituent VAD scores, with each constituent’s score divided into 7 intervals over [–1, 1]. Within each bin, researchers computed (a) mean human-rated MWE score, (b) the percentage of MWEs classified as “high” (top 33%), and (c) as “low” (bottom 33%) on each dimension. The results were visualized as heatmaps, mapping compositional trends across the score space.

Reliability checks for MWE ratings mirrored the unigram analyses, with all split-half coefficients exceeding 0.95, confirming robustness of the annotation protocol (Mohammad, 25 Nov 2025).

4. Quantitative Findings

The primary findings demonstrate a monotonic compositional trend for all three VAD dimensions:

Valence: The mean MWE valence ( $V(e)$ ) increases nearly linearly from lowest to highest constituent bin. The estimated Pearson correlation between predicted and actual valence is $r \approx 0.75\text{–}0.85$ .
Arousal and Dominance: Trends are qualitatively similar but with shallower slopes, indicating lower compositionality. Estimated correlations are $r \approx 0.50\text{–}0.60$ for arousal and $r \approx 0.45\text{–}0.55$ for dominance.

Non-compositional effects remain notable: 1.66% of MWEs with both constituents neutral are rated “high” valence; 4.79% are “low” valence. Analogous non-compositionality occurs for arousal and dominance. Illustrative examples include:

Compositional: “sunny day” ( $\hat V=0.56$ , $V(\text{sunny day})=0.59$ ); “power surge” ( $\hat A=0.47$ , $A(\text{power surge})=0.50$ )
Non-compositional: “rock bottom” ( $\hat V=0.00$ , $V(\text{rock bottom})=-0.62$ ); “breath of fresh air” ( $\hat V\approx0.00$ , $V(\text{breath of fresh air})=+0.71$ ) (Mohammad, 25 Nov 2025)

5. Linguistic Patterns in MWE Emotionality

Valence exhibits the highest degree of compositionality, while dominance is least compositional. Noun compounds often possess compositional arousal and dominance, likely because the naming of events high in arousal/dominance tends to be transparent and concrete. By contrast, idioms and other fixed expressions are substantially less compositional, reflecting their conventional and often metaphorical origin. Negative/low-valence idioms are disproportionately non-compositional, with affect arising from cultural usage rather than constituent semantics.

The following table summarizes compositionality trends by MWE type:

MWE Type	Affective Dimension	Compositionality Level
Noun compounds	Arousal/Dominance	High
Fixed expressions	Valence	Low
Verb–particle const.	Mixed	Intermediate

6. Implications for NLP and Computational Lexicons

Relying exclusively on word-level VAD in sentiment or emotion analysis can fail to capture a non-trivial fraction of strongly emotional MWEs. A plausible implication is that NLP systems need hybrid strategies: using direct MWE-level lookup for idiomatic expressions alongside compositional computations for more transparent compounds. The NRC VAD v2 lexicon provides necessary infrastructure for such hybrid approaches, offering resources applicable across NLP, psychology, digital humanities, and allied fields (Mohammad, 25 Nov 2025).

7. Conclusions and Future Directions

The compositional analysis provided by NRC VAD v2 establishes quantitative benchmarks for emotional compositionality in English MWEs, identifying where word-level information suffices and where idiom-level ratings are indispensable. This lexicon offers reliable human emotional judgments for a large inventory of MWEs, enabling more nuanced and accurate computational models. Future research may extend these methodologies beyond English or develop refined models to address the limitations of purely additive compositionality, grounded in the empirical patterns revealed by this dataset (Mohammad, 25 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Breaking Bad: Norms for Valence, Arousal, and Dominance for over 10k English Multiword Expressions (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Emotional Compositionality in MWEs.