Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dynamic Word Embeddings

Updated 12 May 2026
  • Dynamic word embeddings are vector representations that adapt to changes in context, time, or domain while capturing semantic drift and polysemy.
  • Methodologies include dynamic skip-gram models with Gaussian diffusion, tensor factorization with alignment, and neural architectures conditioned on extralinguistic metadata.
  • They enable improved performance in bias analysis, sentiment classification, and event tracking, offering clearer insights compared to static embedding approaches.

Dynamic word embeddings are vector-space representations of lexical items that capture semantic, syntactic, or discourse-level properties that change as a function of extrinsic attributes—most commonly time, domain, or context. Unlike static embeddings, which assign each word type a single vector regardless of context or epoch, dynamic word embeddings seek to model semantic drift, polysemy, social variation, and emerging linguistic phenomena by learning time- or context-dependent representations. Methodologies span continuous dynamical systems, probabilistic models with temporal priors, low-rank matrix/tensor decompositions, and contextualized neural architectures adapted to extralinguistic metadata.

1. Mathematical Formulations and Model Classes

Dynamic word embeddings are built upon several mathematically distinct frameworks, unified by the central operation of mapping a tuple (word, context, extralinguistic attribute) to a vector in Rd\mathbb{R}^d or (in quantum-inspired models) a unit-norm vector in a Hilbert space.

Continuous and Discrete-Time Trajectory Models

A canonical formulation is the use of a sequence of embedding matrices {Ut}\{U_t\}, where tt indexes discrete time slices or domains, and each Ut[w]U_t[w] is the vector embedding for word ww at tt. This is prominent in time-series generalizations of skip-gram and CBOW:

  • Dynamic Skip-Gram (DSG): Each ui,tu_{i,t} evolves via a Gaussian (or Ornstein–Uhlenbeck) prior:

UtUt1N(Ut1,DI)U_t | U_{t-1} \sim \mathcal{N}(U_{t-1}, D I)

Embeddings are optimized for likelihood under observed co-occurrence data, regularized by these temporal priors (Bamler et al., 2017, Montariol et al., 2019).

ρv(t)ρv(t1)N(ρv(t1),λ1I)\rho_v^{(t)} | \rho_v^{(t-1)} \sim \mathcal{N}(\rho_v^{(t-1)}, \lambda^{-1}I)

Context embeddings remain static (Rudolph et al., 2017, Montariol et al., 2019).

  • Matrix/Tensor Factorization with Alignment: Regularized joint factorizations simultaneously enforce slice-specific fidelity and temporal smoothness:

{Ut}\{U_t\}0

(Yao et al., 2017, Brandl et al., 2022).

Attribute-Conditioned and Contextualized Representations

Attribute-conditioned models extend the embedding function to arbitrary attributes: {Ut}\{U_t\}1 where {Ut}\{U_t\}2 is a global (attribute-invariant) embedding, {Ut}\{U_t\}3 indexes attribute values (e.g., time, domain, city), and {Ut}\{U_t\}4 are learned offsets (Gillani et al., 2019).

Dynamic Contextualized Embeddings

Neural architectures based on pre-trained LLMs (PLMs) have been adapted to dynamic settings by augmenting or conditioning input embeddings dynamically:

  • Dynamic Contextualized Word Embeddings (DCWE): For token {Ut}\{U_t\}5 at (social unit {Ut}\{U_t\}6, time {Ut}\{U_t\}7), the dynamic input is

{Ut}\{U_t\}8

where {Ut}\{U_t\}9 is a social-context embedding obtained via a time-specific graph attention network, regularized by Gaussian anchoring and temporal random-walk priors (Hofmann et al., 2020).

  • Template-Based Temporal Adaptation: Masked LLMs (MLMs) are adapted to later timestamps using temporally-sensitive prompts derived from anchor/pivot term extraction, leading to new embedding parameters tt0 specific to the epoch (Tang et al., 2022).

Hilbert-Space and Quantum-Contextual Approaches

A recent alternative is based on quantum contextuality:

  • Quantum Contextual Embeddings: Each word tt1 is a unit vector tt2, and each context tt3 is an orthonormal basis tt4 of tt5. Word sense in context tt6 is determined probabilistically via the Born rule:

tt7

Polysemy arises from vectors tt8 appearing in multiple, possibly incompatible, bases (Svozil, 18 Apr 2025).

2. Training Objectives and Optimization Algorithms

Training objectives for dynamic word embeddings combine likelihood under observed corpora and explicit temporal, structural, or contextual regularization.

Temporal Priors and Drift Regularization

  • Random Walk / Diffusion Priors: Temporal smoothing is enforced via Gaussian penalties on drift:

tt9

This penalizes abrupt changes and yields smooth, interpretable trajectories (Bamler et al., 2017, Rudolph et al., 2017, Montariol et al., 2019).

  • HardThreshold Drift Regularizer: Enhanced separation of stable and drifting words under scarcity:

Ut[w]U_t[w]0

(Montariol et al., 2019).

Matrix/Tensor Factorization Alignment

  • Joint Regularized Factorization: Loss functions combine reconstruction, pairwise-alignment, and latent structural affinity weights Ut[w]U_t[w]1:

Ut[w]U_t[w]2

with Ut[w]U_t[w]3 learned by inverting slice distances (Brandl et al., 2022).

Neural and Contextualized Optimization

  • DCWE and Temporal/Attribute Adaptation: Models are optimized end-to-end with cross-entropy on masked LM or task objectives, plus anchoring and random-walk priors on offset parameters. Graph-based and feed-forward modules modeling external structure are updated by backpropagation alongside the base PLM parameters (Hofmann et al., 2020, Tang et al., 2022).

Quantum Contextual Training (Theoretical)

  • KL-Divergence from Target Sense Distribution: Not implemented at scale; proposed as:

Ut[w]U_t[w]4

with joint optimization over Ut[w]U_t[w]5 and Ut[w]U_t[w]6 under orthonormality constraints (Svozil, 18 Apr 2025).

3. Temporal and Contextual Alignment Techniques

The "alignment problem"—the lack of consistent coordinate systems across independently trained time/domain slices—necessitated the development of alignment-aware dynamic models.

  • Joint-Smoothing and Alignment: Regularized models (temporal priors, structural constraints) jointly couple embeddings at all time points, eliminating the need for post-hoc orthogonal Procrustes alignment and producing consistent trajectories (Bamler et al., 2017, Yao et al., 2017, Brandl et al., 2022).
  • Unified Embedding Space: Attribute-conditioned additive models inherently align all attribute-specific embeddings via the global Ut[w]U_t[w]7 (Gillani et al., 2019).
  • Contextual Adaptation in PLMs: Dynamic contextualization is achieved by reparameterizing input layers and fine-tuning transformer-based architectures with temporal/social prompts, ensuring embeddings are adapted yet still comparable across epochs or social units (Hofmann et al., 2020, Tang et al., 2022).

4. Evaluation, Empirical Findings, and Applications

Dynamic embeddings have been empirically validated via intrinsic and extrinsic tasks. Key experimental paradigms include:

Intrinsic Metrics

Extrinsic and Downstream Tasks

  • Bias Analysis: Dynamic embeddings enable measurement of gender and ethnic occupation bias trajectories and their alignment with demographic data (Gillani et al., 2019).
  • Sentiment and Classification: Incorporating dynamic contextualization yields modest but statistically significant improvements in classification accuracy and FUt[w]U_t[w]9 (Hofmann et al., 2020).
  • Event and Concept Tracking: Changes in nearest-neighbor sets over time have been used to track sociological and technological shifts (“blackberry” from fruit to device and back) (Brandl et al., 2022).

Polysemy and Dimensionality

  • Stochastic-Dimensionality Models: The number of embedding dimensions per word, inferred nonparametrically, reflects word frequency and degree of polysemy, with broad terms allocated more active dimensions (Nalisnick et al., 2015).
  • Quantum Contextuality: Proposed as an alternative mechanism to statically encode context/prominence of word senses via joint participation in distinct bases, offering an explicit probabilistic sense distribution (Svozil, 18 Apr 2025).

5. Practical Considerations: Data Scarcity, Initialization, and Scalability

Dynamic embedding models must address challenges aggravated by temporal/data sparsity and high dimensionality.

Data Scarcity

  • Smoothing and Sharing: Temporal priors (e.g., diffusion, random walk) and global embeddings (e.g., ww0) smooth over sparse slices, preserving continuity and suppressing noise (Montariol et al., 2019, Gillani et al., 2019).
  • Initialization: Static pre-training (on concatenated corpora) yields significant gains under data scarcity. Backward-initialization—aligning from large, late-period corpora—can be optimal for long diachronic ranges (Montariol et al., 2019).
  • Regularizers: Hard-thresholded drift penalties enhance interpretability of semantic drift under low-resource conditions (Montariol et al., 2019).

Scalability

  • Block Coordinate and Minibatch Optimization: Efficient sparse matrix operations, block coordinate updates, and scalable variational inference enable learning on corpora spanning tens to hundreds of time slices and large vocabularies (Yao et al., 2017, Bamler et al., 2017).
  • Contextualized PLMs: Augmentations to BERT-scale models remain tractable through modular feed-forward and graph attention layers, only marginally increasing wall-time or memory (Hofmann et al., 2020).
  • Quantum Contextual Models: While mathematically attractive, scaling joint learning of intertwining bases and orthonormality constraints to large lexicons remains an unsolved challenge (Svozil, 18 Apr 2025).

6. Limitations, Open Questions, and Frontier Directions

Dynamic embedding research highlights several unresolved issues:

  • Abrupt Change and Non-Gaussian Dynamics: Existing models predominantly assume smooth Gaussian (Brownian/O-U) drift, limiting detection of sudden concept shifts or change-points. Extensions to piecewise or nonstationary priors are an open problem (Yao et al., 2017).
  • Cross-Linguistic Trajectory Analysis: Dynamic embeddings facilitate cross-lingual comparison of semantic drift post static-alignment, but aligning trajectories with fine temporal granularity in multilingual settings presents both computational and theoretical difficulties (Montariol et al., 2019).
  • Polysemy Modeling: While stochastic-dimensional and quantum models offer interpretable proxies for word complexity and sense distribution, robust benchmarks for evaluating polysemous and context-sensitive representations over time are scarce (Nalisnick et al., 2015, Svozil, 18 Apr 2025).
  • Evaluation Paradigms: Most work is restricted to intrinsic evaluation (semantic similarity, analogy, or drift visualization); few extrinsic tasks exist that are specifically sensitive to temporal or contextual adaptation. Development of gold-standard benchmarks for dynamic sense disambiguation remains a priority (Yao et al., 2017, Montariol et al., 2019).
  • Template and Prompt-Based Adaptation: Techniques for automated template generation and selection in the temporal adaptation of PLMs are active research areas. The balance between template diversity and noise, as well as the generalization to low-resource or multilingual settings, remains underexplored (Tang et al., 2022).

7. Summary Table: Core Dynamic Embedding Methodologies

Model/Method Core Mathematical Device Temporal Regularization
Dynamic Skip-Gram / DSG Sequential Gaussian diffusion, ELBO Explicit, via prior
Dynamic Bernoulli Embeddings / DBE Random-walk prior on embeddings Explicit, via prior
Structure Prediction (W2VPred) Joint factorization, latent affinity matrix Implicit, via ww1
Unified Additive Attribute Models ww2 + per-attribute ww3 Implicit/global
Dynamic Contextualized Embedding (DCWE) FFN + GAT over social/time, PLM adaptation Anchoring, random-walk
Stochastic Dimensionality Skip-Gram (SD-SG) Distribution over embedding dimension ww4 Geometric+nonparametric
Quantum Contextual Word Embedding Hilbert space, intertwining contexts Theoretical

Each model is distinguished by (i) the locus of representation dynamics (type, token, context, attribute), (ii) the form and explicitness of temporal or contextual priors, and (iii) the relationship between embedding alignment and learning.


Dynamic word embeddings constitute a broad, technically rigorous field integrating time series analysis, Bayesian inference, graph and matrix factorization, and neural contextualization. The current frontier spans quantum formalizations, contextually adaptive PLMs, and scalable cross-linguistic models, with empirical focus steadily shifting from static, temporally-agnostic word spaces to architectures reflecting the true dynamism of language in use. Further progress depends on advances in scalable optimization, enriched annotation and evaluation paradigms, and theoretical innovations that reconcile continuity, abruptness, and interpretability in lexical semantics.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dynamic Word Embeddings.