Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change (1605.09096v6)

Published 30 May 2016 in cs.CL

Abstract: Understanding how words change their meanings over time is key to models of language and cultural evolution, but historical data on meaning is scarce, making theories hard to develop and test. Word embeddings show promise as a diachronic tool, but have not been carefully evaluated. We develop a robust methodology for quantifying semantic change by evaluating word embeddings (PPMI, SVD, word2vec) against known historical changes. We then use this methodology to reveal statistical laws of semantic evolution. Using six historical corpora spanning four languages and two centuries, we propose two quantitative laws of semantic change: (i) the law of conformity---the rate of semantic change scales with an inverse power-law of word frequency; (ii) the law of innovation---independent of frequency, words that are more polysemous have higher rates of semantic change.

Authors (3)

William L. Hamilton (46 papers)
Jure Leskovec (233 papers)
Dan Jurafsky (118 papers)

Citations (895)

View on Semantic Scholar

Summary

Analysis of "Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change"

The paper under discussion, authored by Hamilton, Leskovec, and Jurafsky, presents advancements in understanding semantic evolution through the use of word embeddings. This work primarily explores how words change meanings over time, utilizing a robust diachronic methodology to quantify semantic transformations and uncovering two statistical laws of semantic change.

Methodological Framework

The authors employ three word embedding techniques—PPMI, SVD, and SGNS—to track semantic shifts across six historical corpora spanning four languages (English, German, French, and Chinese) and two centuries. Notably, the analysis contrasts embedding methods to establish reliability and robustness in detecting semantic change.

PPMI (Positive Pointwise Mutual Information): A traditional technique measuring word-context associations.
SVD (Singular Value Decomposition): A dimensionality reduction approach.
SGNS (Skip-gram with Negative Sampling, i.e., word2vec): An optimized method for predicting word co-occurrences, allowing incremental temporal initialization.

These embeddings are aligned over time using orthogonal Procrustes to ensure temporal consistency, enabling the authors to meaningfully compare word vectors across different historical periods.

Evaluation of Techniques

The paper rigorously evaluates the three embedding methods on two fronts: synchronic accuracy and diachronic validity.

Synchronic Accuracy: Performance is assessed against a modern similarity benchmark (the MEN dataset), revealing that SVD outperforms PPMI and SGNS in capturing word similarities within individual time periods.
Diachronic Validity: Evaluation is two-fold:
- Detection of Known Shifts: Utilizing a set of historically attested semantic changes (e.g., the semantic shift of "gay" from "cheerful" to "homosexual"), SGNS shows the highest efficacy on the EngAll dataset, though its performance declines with smaller datasets like COHA.
- Discovery of Shifts: By identifying the top-10 most semantically shifted words between 1900s and 1990s, the SGNS model notably excels, capturing genuine historical shifts (e.g., "wanting" shifting from "lacking" to "desiring") with higher accuracy compared to SVD and PPMI.

Statistical Laws of Semantic Change

The paper's novel contribution lies in formulating two statistical laws derived from a large-scale analysis, offering insights into the dynamics of semantic change:

Law of Conformity: The paper finds that the rate of semantic change inversely correlates with word frequency, formalized as:

$\Delta(w_i) \propto f(w_i)^{\beta_f}$

where $\beta_f$ ranges between -1.24 and -0.30 across datasets. This implies that frequently used words are more semantically stable, a phenomenon likened to "conformity".

Law of Innovation: Independent of frequency, words with higher polysemy show higher rates of semantic change, formalized as:

$\Delta(w_i) \propto d(w_i)^{\beta_d}$

where $\beta_d$ ranges from 0.08 to 0.53. This finding indicates that polysemous words are more adaptable and subject to semantic drift.

Implications and Future Directions

The statistical laws uncovered by this paper have profound implications for both historical linguistics and modern computational models of language. The law of conformity suggests that any linguistic model must account for frequency effects when predicting semantic changes. The law of innovation, highlighting the role of polysemy, points to a potentially recursive relationship between word senses and semantic evolution.

Further exploration might expand these findings to more languages and longer time frames, providing deeper insights into the mechanisms of language change. Additionally, understanding the causal factors behind these statistical laws—such as sociocultural influences or cognitive biases—can enrich theoretical models of language evolution and inform applications in natural language processing, particularly in tasks like word sense disambiguation and historical text analysis.

This paper successfully bridges the gap between traditional linguistic theories and modern computational methods, contributing significantly to the field of diachronic semantics. The robust methodological framework and statistical laws proposed pave the way for future research aimed at uncovering the underlying principles governing language evolution.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/AviGamoran/status/1786854712903340050