Static Co-occurrence Embeddings

Updated 6 March 2026

Static co-occurrence embeddings are representations of discrete items derived from observed co-occurrence patterns, offering high interpretability and computational efficiency.
They utilize diverse methods including count-based probability normalization, matrix/tensor factorization, and neural prediction to support applications in NLP, vision, and urban planning.
Their strengths include scalability and direct empirical interpretability, though they trade off the dynamic context sensitivity found in newer embedding paradigms.

Static co-occurrence-based embeddings are a foundational paradigm for representing symbolic items—typically words, but also objects, tags, emojis, or other discrete entities—via fixed-dimensional vectors derived directly from observed co-occurrence statistics in large datasets. These embeddings serve as an information-preserving, low-dimensional interface between raw, high-cardinality count data and downstream statistical or neural models. Unlike contextual embeddings, static co-occurrence-based embeddings associate each item with a single, context-independent vector learned from the empirical distribution of its co-occurrence with other items. The design space encompasses explicit count-matrix methods, neural likelihood models, graph-theoretic constructions, spectral algorithms, tensor factorization approaches, and hybrid extensions incorporating semantic signals. Their interpretability, computational efficiency, and broad applicability make them central across natural language processing, vision-language tasks, urban planning, and knowledge extraction.

1. Core Methodologies for Constructing Static Co-occurrence Embeddings

Static co-occurrence embeddings are fundamentally built from a co-occurrence matrix or related higher-order array. The workflow typically involves:

Co-occurrence Matrix Construction: For a vocabulary $\mathcal{V}$ of $n$ items, define $M \in \mathbb{N}^{n \times n}$ such that $M_{ij}$ counts how often $i$ and $j$ co-occur in a window (textual, visual, or semantic) (Gallardo et al., 9 Nov 2025, Dugué et al., 2019). Variations include raw counts, PPMI (positive pointwise mutual information), semantic affinity, or symmetrized conditional probabilities.
Row-normalization/Probability Embeddings: For discrete domains or small taxonomies (e.g. urban-object types or impression tags), a simple, interpretable embedding is the normalized conditional probability vector:

$v_i = [P(o_1|o_i), P(o_2|o_i), \ldots, P(o_n|o_i)],\quad P(o_j|o_i) = \frac{M_{ij}}{\sum_{k=1}^n M_{ik}}.$

This approach is explicit and guarantees that each dimension corresponds directly to a reference class (Gallardo et al., 9 Nov 2025).

Matrix/Tensor Factorization: Canonical models (e.g., GloVe) factorize a weighted, global co-occurrence matrix (often PPMI-weighted) to obtain low-rank embeddings. Extensions perform symmetric Canonical Polyadic (CP) or joint factorization of higher-order tensors $X^{(N)}$ to capture $N$ -way interactions, supporting polysemy and richer relational structure (Bailey et al., 2017, Bollegala et al., 2017).
Prediction-based Neural Embeddings: Models such as skip-gram with negative sampling, CBOW, and fastText implicitly factorize co-occurrence statistics by optimizing the prediction of context given a center item or vice versa. Static embeddings are derived from the learned projection matrices (Bihani et al., 2020).
Graph-based and Spectral Methods: Co-occurrence matrices can be interpreted as adjacency matrices for graphs whose nodes are items and edges represent observed affinities (possibly weighted by PPMI, TF-IDF, or symmetrized conditional co-occurrences). Spectral embedding of normalized Laplacians, community detection with label propagation, and LINE-style graph embeddings provide scalable, interpretable alternatives to factorization and neural models (Dugué et al., 2019, Kubota et al., 26 Aug 2025, Illendula et al., 2018, Li et al., 2023).

2. Algorithmic Variants and Their Technical Properties

Static co-occurrence frameworks exhibit notable diversity in technical choices:

Explicit Probabilistic Models: Simple row-normalized count matrices ( $v_i$ ) maximize empirical interpretability and are suitable for domains with small $n$ or where conditional affinity is itself the desired output (Gallardo et al., 9 Nov 2025).
PPMI/Statistical Weighting: Most large-scale word embeddings apply PMI, PPMI, or significance-threshold filtering to raw counts, mitigating skew from high-frequency items and emphasizing informative associations (Dugué et al., 2019). This is especially relevant for memory- and compute-intensive applications.
Spectral and Community-based Dimensionality Reduction: Label propagation and eigen-decomposition of normalized Laplacians cluster items into communities or project them into $k$ -dimensional spaces where each coordinate spans a community or latent class (Dugué et al., 2019, Kubota et al., 26 Aug 2025). Resulting vectors are often sparse, highly interpretable, and ideal for downstream clustering and alignment.
Neural Graph and Random Walk Embeddings: Algorithms such as LINE train node vectors to preserve first-order (immediate co-occurrence) or second-order (neighbor context) proximity via stochastic gradient methods with negative sampling, enabling efficient scaling to millions of items and flexible incorporation of heterogeneous edge types (Illendula et al., 2018, Li et al., 2023).
Tensor and $k$ -way Factorization: Extending beyond matrices, the factorization of $k$ -way co-occurrence tensors encodes higher-order semantics—the joint structure of phrases, visual regions, or multi-entity interactions—yielding theoretically and empirically justified gains over pairwise methods up to moderate $k$ (Bailey et al., 2017, Bollegala et al., 2017).
Pseudo-likelihood and Energy-based Models: For binary co-occurrence vectors or basket data, energy-based models (e.g., Fully Visible Boltzmann Machine, DEM) optimize conditional probabilities via maximum pseudo-likelihood, supporting general $L_k$ dependency and competitive performance on collaborative filtering and clustering (Shen et al., 2015).

3. Interpretability and Evaluation of Static Co-occurrence Embeddings

Interpretability is a defining feature of many static co-occurrence approaches:

Direct Empirical Affinity: In explicit-probability embeddings, each coordinate directly signals empirical likelihood of co-occurrence, affording domain experts full transparency (e.g., Top-K recommendations in urban design map directly to the most likely complements) (Gallardo et al., 9 Nov 2025).
Semantic Compositionality and Clusters: Spectral/community and tensor-based embeddings yield coordinates directly aligned to interpretable groupings—semantic fields, communities, polysemous senses—allowing end-users to diagnose and manipulate the embedding space (Dugué et al., 2019, Bailey et al., 2017).
Recovery of Real-world Properties: Linear probing demonstrates that static co-occurrence-based embeddings encode accessible geographical, temporal, and environmental structure derived solely from text data, with $R^2$ up to 0.87 for geographic tasks (Barenholtz, 4 Mar 2026).
Intrinsic and Extrinsic Benchmarks: Embedding quality is measured on word similarity (MEN, SimLex-999), categorization (ESSLI), analogical reasoning, and downstream prediction or retrieval accuracy. Approaches such as SemGloVe demonstrate that integrating semantic signals from LLMs can substantially enhance similarity performance (e.g., +16.6 points in Spearman correlation on word similarity datasets) (Gan et al., 2020).

4. Domain-specific Applications and Extensions

The scope of static co-occurrence embeddings extends far beyond classic word-level NLP:

Urban Design: Embeddings constructed from filtered scene co-occurrence matrices directly drive recommendation of object complements in urban planning platforms, supporting human-in-the-loop workflows that reflect real-world arrangement patterns (Gallardo et al., 9 Nov 2025).
Font Impression and Visual Embeddings: Embeddings for impression tags are obtained by spectral embedding of a co-occurrence graph induced by shared font annotations, enabling robust retrieval and guiding diffusion-based generative models for font synthesis, outperforming general-purpose models like BERT and CLIP in impression alignment (Kubota et al., 26 Aug 2025).
Emoji Usage in Social Media: Emoji co-occurrence graphs derived from tweets are embedded with scalable graph neural methods (LINE) and outperform semantic baselines in sentiment analysis and similarity ranking (Illendula et al., 2018).
Visual Co-occurrence in Vision-Language Tasks: Static word vectors trained from visual region/image co-occurrence matrices capture concept similarity unattainable from text alone; multi-task log-bilinear objectives allow a single vector to encode objectness, attributeness, and ontological relations, enhancing zero-shot classification and vision-language transfer (Gupta et al., 2019).
Knowledge Extraction and QA: Even simple nearest-neighbor retrieval in static fastText embeddings outperforms BERT on cloze-style factual queries when trained on large-vocabulary Wikipedia, at three orders of magnitude less energy cost, revealing that much "encyclopedic" knowledge resides in associative co-occurrence priors (Dufter et al., 2021).
Domain-adaptive and Online Embedding Refinement: Lightweight algorithms diffuse or update embeddings via local averaging over observed co-occurrence windows, adapting static vectors to new corpora, handling OOV tokens, and tracking semantic drift without neural model retraining (Kubek et al., 21 Apr 2025).

5. Limitations, Tradeoffs, and Practical Considerations

While static co-occurrence-based embeddings remain highly influential, key tradeoffs and domain-dependent choices apply:

Capacity and Scalability: Matrix and tensor factorization approaches are constrained by memory and computational cost, especially at high $k$ or large vocabulary; graph/spectral and neural random-walk methods offer more stable scaling with corpus size (Li et al., 2023).
Data Sparsity and Overfitting: Higher-order ( $k \geq 4$ ) co-occurrence tensors exhibit severe sparsity, suggesting $k=2$ (pairwise) or $k=3$ is optimal for practical corpora (Bollegala et al., 2017).
Regularization and Smoothing: While some explicit models operate strictly on maximum-likelihood counts, others benefit from PMI weighting, negative sampling, and statistical thresholding to counteract frequency imbalances (Dugué et al., 2019, Gallardo et al., 9 Nov 2025).
Polysemy and Compositionality: Standard static embeddings associate one vector per item; only higher-order tensor approaches and multiplicative compositionality provide a partial account of context-dependent meaning (Bailey et al., 2017, Bollegala et al., 2017).
Non-contextuality: Static co-occurrence models cannot represent dynamic, context-aware sense shifts within a corpus; they assume global stationarity.
Interpretation Constraints: While row-normalized vector representations maximize transparency, classical low-rank factorization and neural models often yield opaque, "rotationally" ambiguous coordinates (i.e., only relative distances are interpretable).
Model Choice Dependence: Empirical analyses confirm that architectural decisions—count-based vs. prediction-based, CBOW vs. skip-gram, windowing policy, and context definition—shape cluster structure, sensitivity, and downstream utility, with skip-gram variants favoring fine-grained semantic associations (Bihani et al., 2020).

6. Emerging Directions and Hybrid Approaches

Recent research blends the strengths of static co-occurrence and contextual embedding paradigms:

Semantic-enriched Co-occurrence: Methods like SemGloVe replace local-window count matrices by context-aware, semantically weighted "co-occurrences" extracted from pretrained LLMs' self-attention or masked-LM distributions, yielding robust gains in both word similarity and extrinsic tagging tasks (Gan et al., 2020).
Self-improving and OOV-aware Embeddings: Adaptive algorithms progressively refine or diffuse static embeddings using local co-occurrence structure, enabling rapid domain adaptation and on-the-fly induction of representations for previously unseen tokens with minimal computational overhead (Kubek et al., 21 Apr 2025).
Multimodal and Hybrid Graphs: Co-occurrence graphs may combine text-derived, visual, and knowledge-graph edges, supporting multi-faceted embeddings aligning with heterogeneous downstream objectives (e.g., joint word–emoji–hashtag embeddings for social media, semantic–visual fusion for scene understanding) (Dugué et al., 2019, Illendula et al., 2018, Gupta et al., 2019).
"Green" Scalability Advocacy: Empirical demonstrations reinforce that extremely large static vocabularies, combined with efficient co-occurrence-based embedding construction, can match or exceed transformer-based contextual models in factual recall while using less than 1% of the energy (Dufter et al., 2021).
Task-specific Co-occurrence Design: For specialized domains (e.g., urban design, font impression, biomedical terminology), tailoring co-occurrence context and normalization yields improved retrieval, generation, and interpretability (Gallardo et al., 9 Nov 2025, Kubota et al., 26 Aug 2025).

7. Summary Table: Major Classes of Static Co-occurrence Embedding Methods

Method Class	Co-occurrence Statistic	Dimensionality/Factorization	Domain(s)
Row-normalized probability	Empirical count $M_{ij}$	$n$ (all classes)	Urban, tags
PPMI + SVD	PPMI( $w_i,w_j$ )	Truncated SVD ( $d \ll n$ )	NLP, general
Prediction-based neural	Windowed pairs $(w, c)$	$d$ (SGNS, CBOW, fastText)	NLP, general
Graph-based, spectral	Graph edges, sym. cond. probs	$k$ (eigs), community size	Tags, word nets
Community-detection	Weighted edge/PPMI-filtered	$K$ (num. communities, sparse)	Word nets
Tensor factorization	$k$ -way counts/PPMI	$d$ via CP/JCP decomposition	NLP, polysemy
Semantic distillation	BERT attention/MLM-derived	GloVe objective with $X^{sem}$	NLP, hybrid
Self-improving/diffusion	Local corpus windows	Matches original $d$ , all $V$ tokens	Domain-specific

These categories synthesize the dominant algorithmic approaches, co-occurrence conceptualizations, and downstream specialties across the field, as empirically validated in recent arXiv research.