Emotional Compositionality in MWEs

Updated 14 December 2025

The study shows that simple compositional models effectively predict Valence in MWEs, yet significant non-compositional cases emerge with idioms.
Empirical analysis using NRC VAD Lexicon v2 reveals extremely high annotation consistency and a strong linear correlation for Valence scores.
Findings underscore the need for advanced, context-aware models to address non-linear emotional cues in multiword expressions.

Emotional compositionality in multiword expressions (MWEs) concerns the extent to which the emotional properties of an expression—quantified primarily via Valence (V), Arousal (A), and Dominance (D)—can be predicted from its constituent words. This topic illuminates critical boundaries between compositional and idiomatic language and underpins theoretical and practical advancements in computational emotion modeling, lexicon construction, and psychological semantics. The NRC VAD Lexicon v2 provides the largest dataset to date of direct human ratings for all three VAD dimensions for over 10,073 English MWEs and 44,928 single words, enabling robust empirical investigation of these compositional phenomena (Mohammad, 25 Nov 2025).

1. Annotation Protocol and Data Reliability

The NRC VAD Lexicon v2 extended coverage to 10,073 MWEs drawn from the 10,500 most frequent expressions in Muraki et al. (2023), and 25,089 unigrams previously absent from version 1. Annotation employed seven-point scales ( $-$ 3 to $+$ 3) for each VAD dimension, with explicit instructions distinguishing these dimensions from evaluative sentiment. Human annotators provided nine judgments per item (on average: V=7.83, A=7.96, D=8.06), with rigorous quality controls based on interspersed “gold” items and annotator exclusion below 80% accuracy.

Split-Half Reliability (SHR) was assessed for each dimension by randomly splitting ratings, averaging, and correlating over 1,000 iterations: Valence (Spearman $\rho$ / Pearson $r$ = 0.98/0.99), Arousal (0.97/0.98), and Dominance (0.96/0.96). SHR values above 0.95 across all dimensions demonstrate extremely high annotation consistency. The final resource includes 10,205 MWEs and 44,928 words, with VAD scores linearly mapped to $[-1, +1]$ .

2. Formal Models of Emotional Compositionality

Emotional compositionality is operationalized via predictive models that estimate the overall VAD of an MWE from its constituents' scores.

Unweighted Mean Model: For a two-word MWE,

$\hat{V}_{\text{mwe}} = \frac{V_1 + V_2}{2},\quad \hat{A}_{\text{mwe}} = \frac{A_1 + A_2}{2},\quad \hat{D}_{\text{mwe}} = \frac{D_1 + D_2}{2}$

Weighted Linear Model:

$\hat{V}_{\text{mwe}} = \alpha V_1 + (1 - \alpha)V_2,\quad 0 \le \alpha \le 1$

Analogously for A and D.

Extremal and Position-Only Models: Extremal models take $\max$ or $\min$ of constituent scores; position-only models consider $V_1$ or $V_2$ as $\hat{V}_{\text{mwe}}$ .

Model fit is evaluated via correlation between predicted and observed scores or root mean squared error (RMSE),

$\mathrm{RMSE}_V = \sqrt{\frac{1}{N} \sum_{i} (\hat{V}_{\text{mwe},i} - V_{\text{mwe},i})^2}$

No regression fitting for $\alpha$ or non-linear models was reported, suggesting an opportunity for future refinement.

3. Empirical Analysis of Compositionality in MWEs

Bigram MWEs (8,330 items) were binned into a 7×7 grid representing constituent VAD values, rounded to one of seven classes from $-$ 1.0 to $+$ 1.0. For each $(c_1, c_2)$ grid cell, the mean observed MWE score $\mu(c_1, c_2)$ , and the proportions $P_{\text{high}}$ (score $\geq +0.33$ ) and $P_{\text{low}}$ ( $\leq -0.33$ ) were computed.

Key findings for Valence:

$\mu(c_1, c_2)$ increases monotonically with both $c_1$ and $c_2$ , exemplifying substantial compositionality (e.g., $\mu(+0.67, +0.67) \approx +0.45$ ; $\mu(-0.67, -0.67) \approx -0.47$ ).
Non-compositional cases are prominent: $P_{\text{high}}(0,0)=1.66\%$ and $P_{\text{low}}(0,0)=4.79\%$ indicate that neutral constituents can yield strongly emotional MWEs.

Arousal and Dominance display similar, but markedly weaker, monotonic grid trends, indicating their compositionality is less pronounced than Valence.

4. Emotionality and Compositionality by MWE Type

MWEs were labelsourced from Muraki et al. (2023) into idioms/fixed expressions, noun compounds, and verb–particle constructions:

MWE Type	Proportion
Noun Compounds	47.8%
Idioms	38.2%
Particle Verbs	14.0%

Valence-class distributions for each type:

Type	Strong–Neg	Mod–Neg	Slight–Neg	Neutral	Slight+	Mod+	Strong+
Idioms	7.1%	13.4%	19.6%	30.3%	19.7%	8.6%	1.3%
Noun Compounds	3.7%	7.2%	18.4%	36.9%	20.5%	9.9%	3.4%
Particle Verbs	8.0%	9.8%	20.4%	37.5%	16.7%	5.9%	1.7%

Idioms exhibit a higher proportion of non-neutral valence ( $\approx 69.7\%$ ) compared to noun compounds ( $63.1\%$ ) and particle verbs ( $62.5\%$ ), and in all types, negative valence classes marginally outnumber positive ones. For Arousal and Dominance, noun compounds tend to occupy higher A and D bins, while idioms and particle verbs skew lower. These breakdowns reveal that idioms are more likely to carry direct emotional cues, while the compositionality of Arousal and Dominance is both construction- and lexeme-sensitive.

5. Implications for Computational Emotion Modeling

The empirical results suggest that Valence in MWEs is often, but not universally, predictable from constituent VAD values—simple compositional models achieve high, but not perfect, correspondence. A considerable proportion of MWEs, especially those with neutral or discordant constituents, exhibit strong non-compositionality, particularly apparent for idiomatic expressions and figures of speech.

Arousal and Dominance yield weaker compositional trends, indicating greater reliance on idiomatic, metaphorical, or constructional factors than on additive or positional contributions of constituents. This suggests that lexicon-based emotion models relying solely on mean-pooling or addition of constituent word scores risk missing significant idiomatic and non-linear effects.

6. Limitations and Open Questions

Key limitations of the NRC VAD v2 resource and its compositionality study include:

English-only coverage with annotators drawn from US and Canadian native speakers, precluding analysis of cross-linguistic and cross-cultural MWE emotionality variation.
Ratings are based on isolated terms, not contextualized sentence usage, and thus cannot address dynamic, context-dependent emotional shifts.
Only basic (mean- and extremal-based) composition models evaluated; parameterized or non-linear fit models (e.g., regression-based, R², optimized $\alpha$ ) remain open for investigation.

Future directions include fitting richer statistical and machine learning models, extension to higher-order MWEs (trigrams, etc.), context-aware compositionality analysis leveraging large-scale corpora or contextualized embeddings, and cross-linguistic studies to assess the universality of compositionality trends (Mohammad, 25 Nov 2025).

7. Resource Availability and Impact

The NRC VAD Lexicon v2, freely accessible online, supports computational and empirical work in natural language processing, psychology, digital humanities, and computational social science. Its direct human annotations provide gold-standard VAD ratings for both single words and over 10,000 MWEs, substantially expanding the empirical foundation available for modeling the emotional content and compositionality of multiword language phenomena (Mohammad, 25 Nov 2025). Such resources enable nuanced analysis of language emotion and have implications for text understanding, affective computing, and the development of more context-sensitive language technologies.

Markdown Report Issue Upgrade to Chat

References (1)

Breaking Bad: Norms for Valence, Arousal, and Dominance for over 10k English Multiword Expressions (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Emotional Compositionality in Multiword Expressions.