Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 164 tok/s

Gemini 2.5 Pro 46 tok/s Pro

GPT-5 Medium 21 tok/s Pro

GPT-5 High 27 tok/s Pro

GPT-4o 72 tok/s Pro

Kimi K2 204 tok/s Pro

GPT OSS 120B 450 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

Latent Variable Greedy Equivalence Search (LGES)

Updated 12 October 2025

Latent Variable Greedy Equivalence Search is a score-based algorithm designed to recover causal structures in partially observed linear models by decomposing covariance into sparse and low-rank components.
It employs specialized operators for latent-observed edge deletion and latent structure refinement, ensuring efficient exploration of the Markov equivalence class.
Empirical results show that LGES achieves robust F1 scores, lower SHD, and interpretable latent factor recovery under realistic GNFM assumptions.

Latent Variable Greedy Equivalence Search (LGES) refers to a class of score-based greedy search algorithms designed for structure identification in causal graphical models, particularly in the presence of latent (unobserved) variables. LGES generalizes classical Greedy Equivalence Search (GES) to partially observed systems, aiming to recover the Markov equivalence class of models that reproduce the observed joint statistics, with rigorous identifiability guarantees. Recent developments have established the first globally consistent score-based framework for latent variable recovery in linear structural models, notably under the Generalized N Factor Model assumption (Dong et al., 5 Oct 2025).

1. Formal Definition and Scope

LGES is a score-based greedy search algorithm on the space of partially observed linear causal models. Given samples of observed variables $\mathbf{X}$ (and possibly associated covariates $\mathbf{Z}$ ), where the underlying data-generating process includes unobserved latent variables $\mathbf{L}$ , LGES seeks to identify the optimal graphical structure—including both measurement and latent variable relationships—up to Markov equivalence. The algorithm is applicable whenever no extra information (e.g., number or locations of latents) is provided a priori. Its identifiability guarantees rely on the ability to decompose the covariance structure of the observed variables into direct (sparse) and latent (low-rank or factor-structured) contributions.

2. Generalized N Factor Model

Central to the identifiability and effectiveness of LGES is the Generalized N Factor Model (GNFM) framework (Dong et al., 5 Oct 2025):

The latent variable model is defined as a directed acyclic graph (DAG), possibly with mutually nonadjacent latent variables grouped into $\{\mathbf{L}_p\}$ .
For each latent group $\mathbf{L}_p$ , there exist at least $2|\mathbf{L}_p|$ observed "effect" variables $\mathbf{X}_p$ such that each $X \in \mathbf{X}_p$ has exactly $\mathbf{L}_p$ as its parents.
Latent group membership propagates required equality constraints to the observed covariance matrix, enabling algebraic identifiability.
If any variable, observed or latent, is causally related to a latent variable in $\mathbf{L}_p$ , it must have the same relation to every member of $\mathbf{L}_p$ .

The GNFM extends classical one-factor models, and, under ML score maximization together with minimum dimension regularization, LGES provably recovers the correct Markov equivalence class in the sample limit.

3. Algorithmic Framework

LGES operates in two core phases (Dong et al., 5 Oct 2025):

Latent-Observed Edge Deletion: Initialization with a "supergraph" state $S_{\text{init}}$ containing all putative latent variables, each posited to cause all observed variables. Edge deletions from latent to observed nodes are greedily proposed if they do not degrade the maximum likelihood (ML) score beyond a pre-specified tolerance $\delta$ .
Latent Structure Refinement: Once a minimal latent-observed structure is inferred, additional edge deletions or orientations among latents are performed, subject again to score preservation. This phase eliminates unnecessary latent-latent connections, refining the structure within the Markov equivalence class.

At all steps, the state is represented as a CPDAG (Completed Partially Directed Acyclic Graph) to efficiently encode equivalence classes. Two operator types navigate the search space:

$O_{LX}(S, L, X)$ : deletes all edges from latent set $L$ to observed set $X$ in the CPDAG $S$ .
$O_{LL}(S, L_1, L_2, H)$ : deletes all edges between latent groups $L_1$ and $L_2$ , with additional orientation toward helper set $H$ as required by GNFM.

The scoring function is

$\text{score}_{\text{ML}}(G, \hat{\Sigma}_X) = \max_{(F,\Omega):\, \text{supp}(F) \subseteq \text{supp}(F_G), \Omega \in \text{diag}(\mathbb{R}_{>0}^{n+m})} \; -\frac{N}{2}\left[ \operatorname{tr}\left(\Sigma_X^{-1} \hat{\Sigma}_X\right) + \log \det \Sigma_X \right]$

where $F$ encodes structural coefficients and $\Omega$ encodes error variances.

4. Identifiability and Global Consistency

Under mild graphical assumptions enforced by the GNFM (specifically, sufficient observed coverage for each latent group), the score-based greedy search attains global consistency:

In the large-sample regime, the algorithm selects the minimum-dimension graph maximizing ML score.
The selected graph $\hat{G}$ imposes the same set of covariance equality constraints $H(\hat{G})$ as the true data-generating graph $G^*$ .
For GNFM classes, algebraic equivalence implies recovery of the full Markov Equivalence Class (MEC): all features inferable from the observational distribution are reconstructed.
The tolerance parameter $\delta$ is chosen on the order of $\delta \sim \frac{\log N}{N}$ for sample size $N$ , analogously to the BIC penalty in classical GES.

5. Operator Properties and Search Efficiency

LGES leverages tailored operators for efficient navigation: | Operator | Purpose | Acceptance Criterion | |:-----------------|:------------------------------------------|:----------------------------------------| | $O_{LX}$ | Delete latent-observed edges | ML score does not degrade > $\delta$ | | $O_{LL}$ | Delete and orient latent-latent edges | ML score does not degrade > $\delta$ |

Each operator maintains the current CPDAG as a supergraph of the true structure, thereby avoiding "over-pruning" and facilitating parallelized evaluation. Deletions monotonically reduce free parameters, leading to efficient exploration even in high-dimensional settings.

6. Evaluation and Applications

Empirical studies (Dong et al., 5 Oct 2025) demonstrate:

LGES achieves superior F1 scores and lower SHD compared to constraint-based approaches (FOFC, GIN, RLCD) on synthetic data matching GNFM assumptions.
Robust performance under misspecification (non-Gaussian noise, mild non-linearity) owing to reliance on covariance constraints.
Inference of interpretable latent structures in real-world datasets, such as personality, burnout, and multitasking behavior, with well-calibrated model fit metrics (RMSEA, CFI, TLI). Extracted latent variables correspond to established psychological factors and reveal item-level cross-loadings.

7. Relation to Broader Latent Variable Model Selection

LGES extends the scope and guarantees of previous latent variable graphical model selection frameworks (Chandrasekaran et al., 2010, Frot et al., 2015) by explicitly handling partially observed systems through direct score-based search. The method incorporates high-dimensional consistency, convex optimization insights, and geometric conditions for identifiability—ensuring unique decomposition of observed covariances into sparse and low-rank components according to tangent space transversality. This synthesis places LGES at the intersection of score-based equivalence search, algebraic latent structure disentanglement, and practical structure discovery in the presence of hidden variables.

In summary, Latent Variable Greedy Equivalence Search (LGES) provides a consistent, principled, and scalable framework for causal structure learning in the presence of latent variables. Its design leverages global covariance constraints, specialized operator definitions, and careful regularization to achieve algebraic and Markov equivalence recovery under realistic graphical assumptions, marking a significant advance for latent variable discovery in empirical sciences.

PDF Markdown Chat (Pro)

References (3)

Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models (2025)

Latent variable graphical model selection via convex optimization (2010)

Latent variable model selection for Gaussian conditional random fields (2015)

Follow Topic

Get notified by email when new papers are published related to Latent Variable Greedy Equivalence Search (LGES).