Papers
Topics
Authors
Recent
2000 character limit reached

Words of Warmth Lexicon

Updated 13 November 2025
  • The Words of Warmth Lexicon is a comprehensive suite capturing association norms for trust, sociability, and warmth across over 26,000 English words.
  • It employs rigorous annotation methods and robust reliability metrics, including split-half correlations, to ensure accurate social perception ratings.
  • The lexicon supports quantitative research on linguistic bias, stereotype analysis, and language development with actionable insights into social cognition.

The Words of Warmth Lexicon is a large-scale suite of association norms capturing perceived trust, sociability, and warmth for over 26,000 common English words. Based on social psychological theory, the lexicon facilitates quantitative analysis of the dimensions of interpersonal perception, enables developmental and applied investigations, and supports nuanced studies of linguistic bias and stereotypes. Trust (T) and Sociability (S) ratings are derived directly from human annotators; Warmth (W) is defined as the stronger association between the two for each word.

1. Theoretical Foundations and Dimensions

Competence (C) and Warmth (W) constitute the primary dimensions for social cognition, as formulated by the Stereotype Content Model (Fiske et al. 2002). Warmth—a measure of perceived intent, encompassing friendliness and hostility—is further decomposed by recent research (Abele et al. 2016; Koch et al. 2024) into two components:

  • Trust (T): Morality, honesty, integrity, sincerity, fairness.
  • Sociability (S): Friendliness, gregariousness, conviviality.

Formally, each word ii in the lexicon is indexed with three real-valued scores TiT_i, SiS_i, WiW_i on [3,+3][-3, +3]. Trust and Sociability are empirically established through annotation; Warmth is operationalized as: Wi={Tiif TiSi SiotherwiseW_i = \begin{cases} T_i & \text{if } |T_i| \ge |S_i|\ S_i & \text{otherwise} \end{cases}

The evolutionary and developmental literature indicates that warmth-based judgments emerge early in childhood, and that sociability precedes trust in early language acquisition.

2. Lexicon Construction and Reliability

2.1 Term Selection

The source vocabulary comprises approximately 44,000 unigrams from the NRC VAD Lexicon v2, filtered to exclude terms with near-neutral valence (0.2<Valence<+0.2-0.2 < \text{Valence} < +0.2), yielding 26,188 emotionally salient unigrams.

2.2 Annotation Procedure

Ratings were crowdsourced via Amazon Mechanical Turk, restricting participation to native English speakers (69% USA, rest UK, Canada, India). Annotator demographics: mean age 39.2 years, 48% female and 52% male (self-reported). Each target was rated on 7-point bipolar scales for Trust and Sociability (3-3 = "very untrustworthy/unsociable", +3+3 = "very trustworthy/sociable", $0$ = "neither"), with task instructions detailing meanings, examples, and prompting annotators to consult dictionaries for ambiguous items.

2.3 Quality Control and Aggregation

“Gold” control items (~2%) were used for real-time and silent accuracy feedback; annotators with sub-80% gold accuracy had their contributions excluded. Lexicon scores per word are aggregated as follows: Ti=1nj=1ntij,Si=1nj=1nsijT_i = \frac{1}{n}\sum_{j=1}^n t_{ij}, \quad S_i = \frac{1}{n}\sum_{j=1}^n s_{ij} Warmth WiW_i is assigned according to the component with greater absolute value.

2.4 Reliability Metrics

Split-half reliability (SHR) was assessed over 1,000 random splits with the following results:

Dimension Mean Annots/Word Spearman ρ\rho Pearson rr
Sociability (S) 7.9 0.965 0.969
Trust (T) 11.4 0.943 0.957
Warmth (W) 8.8 0.965 0.974

All correlations are reported as ±0.002\pm 0.002.

3. Lexicon Statistics and Distributions

3.1 Categorical Labeling

Each word is assigned a categorical label on a 7-class scale: Very/Moderately/Slightly Warm/Neutral/Slightly/Moderately/Very Cold. The class proportion breakdown is:

Dimension Very High Moderately High Slightly High Neutral Slightly Low Moderately Low Very Low
Trust (T) 2.8 % 13.8 % 12.3 % 38.6 % 13.3 % 14.6 % 4.5 %
Sociability (S) 11.2 % 12.0 % 12.7 % 16.4 % 13.4 % 27.0 % 7.4 %
Warmth (W) 12.3 % 17.0 % 12.7 % 10.5 % 11.7 % 26.9 % 8.8 %

3.2 Empirical Distributions

The distributions for T, S, and W are approximately zero-centered: Tˉ0.00,Sˉ0.00,Wˉ0.00\bar{T} \approx 0.00, \quad \bar{S} \approx 0.00, \quad \bar{W} \approx 0.00 Standard deviations for each scale are \sim1.2–1.3.

3.3 Inter-Dimension Correlations

Empirical inter-correlations for real-valued scores across 26k words are moderate to strong: rT,S0.68,rW,T0.92,rW,S0.87(all p<0.001)r_{T,S} \approx 0.68,\quad r_{W,T} \approx 0.92, \quad r_{W,S} \approx 0.87\quad(\text{all }p<0.001)

3.4 Illustrative Word Examples

Dimension Top³ (score) Bottom³ (score)
Trust (T) consoler (2.00), cohesiveness (2.18), ethicist (2.50) narcissm (–3.00), horrible (–2.78), denigration (–2.44)
Sociability (S) consoler (3.00), cohesiveness (3.00), wedding (2.88) stalker (–3.00), gentrify (–1.75), outcast (–1.80)
Warmth (W) consoler (3.00), cohesiveness (3.00), wedding (2.88) stalker (–3.00), narcism (–3.00), horrible (–2.78)

4. Developmental and Applied Insights

4.1 Age-of-Acquisition

Integrating W/T/S norms with age ratings (Kuperman et al. 2012) and binning words into High/Neutral/Low at ±\pm1.5, developmental analyses show:

  • Children disproportionately acquire high-W and high-S words at early ages; the proportion of low-W/S words rises from age 3 to 17, with \sim50% of acquired W/S words being polar at each age.
  • High-T word acquisition remains stable until age 10, declining thereafter as low-T acquisition rises.
  • High-C word acquisition peaks near age 10, later decreasing; low-C word acquisition is highest in early years.

These patterns empirically support the primacy of valence and indicate that sociability is acquired before trust during language development.

4.2 Bias and Stereotype Reseach

Utilizing both direct lookup and co-occurrence ("co-term") methodologies with large Twitter datasets (Vishnubhotla & Mohammad 2022; Wahle et al. 2025), lexicon analysis reveals established stereotype and bias patterns:

  • Social Groups: muslim, jew, immigrant exhibit low direct W; elderly score high on W but low on C; criminal scores very low on W.
  • Gender Terms: direct scores show high W for all gender terms; father/mother have high C, grandmother low C. Co-term analysis of tweets: references to "you" use more high-C language, "we" more high-W.
  • In-group / Out-group: bilateral analysis of Canadians and Americans finds self-references use higher W/C co-terms, consistent with in-group favoritism.
  • Professions: direct scores—engineers, doctors, teachers high C; nurses and teachers high W; jobless very low. Co-term results: "CEO" higher C context than "engineer"; "doctor" co-terms display more low-C language than "nurse," evidencing context sensitivity.

A plausible implication is that the lexicon, when paired with co-term methods, provides a robust foundation for quantitative bias and stereotype investigations in digital discourse.

5. Practical Integration, Limitations, and Ethical Considerations

5.1 Usage Guidance

  • Text scoring: For any text, scores can be assigned to each token for T, S, W, enabling calculation of mean, sum, or "polar" differential aggregates.
  • Comparative analysis: Researchers may examine relative differences (e.g., percent increases in high-W words) across temporal or group splits.
  • Bias/stereotype investigation: Co-term pairing (Turney 2002; Teodorescu & Mohammad 2023) facilitates measurement of W/C usage around target entities.

5.2 Limitations and Considerations

  • The lexicon covers 26k unigrams, favoring U.S.-centric corpora.
  • Scores reflect predominant word senses; specialized or ambiguous terms may require re-annotation.
  • Annotator pool is skewed toward U.S., Canada, UK, India—demographic biases are possible.
  • Lexicon scores reflect common perceptions (association norms), not objective reality.
  • Not suitable for assessing single utterances; reliability requires aggregate analysis over multiple items.
  • Scores are context-sensitive; comparative framing is recommended.
  • Essentializing speakers should be avoided; focus should be on the use of warmth-related language in context.

All resources are released under terms prohibiting direct redistribution in large training corpora. The lexicon supports interdisciplinary research spanning social cognition, computational bias analysis, digital humanities, and sentiment modeling, and is intended to enrich the quantitative paper of linguistic social perception.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Words of Warmth Lexicon.