Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

144 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

45 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

2 1

Identifiability of latent causal graphical models without pure children (2505.18410v1)

Published 23 May 2025 in stat.ML and cs.LG

Abstract: This paper considers a challenging problem of identifying a causal graphical model under the presence of latent variables. While various identifiability conditions have been proposed in the literature, they often require multiple pure children per latent variable or restrictions on the latent causal graph. Furthermore, it is common for all observed variables to exhibit the same modality. Consequently, the existing identifiability conditions are often too stringent for complex real-world data. We consider a general nonparametric measurement model with arbitrary observed variable types and binary latent variables, and propose a double triangular graphical condition that guarantees identifiability of the entire causal graphical model. The proposed condition significantly relaxes the popular pure children condition. We also establish necessary conditions for identifiability and provide valuable insights into fundamental limits of identifiability. Simulation studies verify that latent structures satisfying our conditions can be accurately estimated from data.

Summary

The paper introduces novel sufficient conditions for model identifiability using a double triangular structure in the Gamma matrix.
It establishes necessary criteria ensuring that latent causal models are uniquely determined even without the pure children requirement.
Simulations validate that the proposed framework accurately recovers latent structures, enhancing causal inference in diverse data settings.

This paper addresses the challenging problem of identifying causal graphical models in the presence of unobserved (latent) variables, a crucial task in fields like psychology, education, and machine learning. While existing methods often impose stringent requirements, such as demanding multiple "pure children" (observed variables with only one latent parent) for each latent variable or restricting the latent causal graph structure, this work proposes significantly weaker conditions for model identifiability.

The core contributions of the paper are twofold:

It introduces novel sufficient conditions for the identifiability of the entire causal model, including the number of latent variables ( $K$ ), the latent causal graph ( $\Lambda$ ), the latent-to-observed bipartite graph ( $\Gamma$ ), and the associated probability distributions. These conditions are centered around a "double triangular" structure in the $\Gamma$ matrix.
It establishes necessary conditions for identifiability, providing insights into the fundamental limits of learning such models.

The proposed framework operates under a Binary Latent Causal Model (BLCM), which makes the following key assumptions:

Latent Variables: Assumed to be binary ( $H_k \in \{0,1\}$ ).
Observed Variables: Can be of arbitrary types (discrete, continuous, mixed) from non-degenerate separable metric spaces.
Assumption 1 (Measurement Model): There are no direct causal edges between observed variables, and no edges from observed to latent variables. Observed variables are thus noisy measurements of the latent variables.
Assumption 2 (Basic Graphical Model Assumptions): The joint distribution $P(X,H)$ satisfies the causal Markov property with respect to the true DAG $G=(X \cup H, E)$ and is faithful to $G$ .
Assumption 3 (Nondegeneracy):
- All $2^K$ configurations of latent variables have non-zero probability ( $P(H=h) > 0$ ).
- For each observed variable $X_j$ , its conditional distribution $P(X_j \mid Pa_G(X_j)=h_p)$ is distinct for every unique configuration $h_p$ of its latent parents.
- Every latent variable $H_k$ has at least one observed child (i.e., no all-zero columns in $\Gamma$ ).

The paper uses a strong statistical notion of identifiability, meaning that if two sets of parameters produce the same observed data distribution $P(X)$ , then the parameters themselves must be equivalent (up to known ambiguities like label permutation of latents, sign-flipping of binary latent states, and Markov equivalence of the latent graph $\Lambda$ ).

Sufficient Conditions for Identifiability

The central sufficient condition is the double triangular $\Gamma$ -matrix:

Definition 3 (Triangular $\Gamma$ -matrix): A $K \times K$ $K \times K$ binary matrix $\Gamma_1$ $Γ_{1}$ (a sub-matrix of $\Gamma$ $Γ$ ) is "triangular" if, after some row and column permutations, it becomes a lower (or upper) triangular matrix with ones on the diagonal.
1 2 3 4 5
Γ₁ = ( 1 0 0 ... 0 ) ( * 1 0 ... 0 ) ( * * 1 ... 0 ) ( : : : ... : ) ( * * * ... 1 )
where * can be 0 or 1.
Definition (Double Triangular): The $J \times K$ $J \times K$ latent-to-observed adjacency matrix $\Gamma$ $Γ$ is "double triangular" if its rows can be permuted such that it contains two disjoint $K \times K$ $K \times K$ triangular sub-matrices, $\Gamma_1$ $Γ_{1}$ and $\Gamma_2$ $Γ_{2}$ .
1 2 3
Γ = ( Γ₁ ) ( Γ₂ ) ( Γ₃ )
where $\Gamma_1, \Gamma_2$ $Γ_{1}, Γ_{2}$ are $K \times K$ $K \times K$ triangular, and $\Gamma_3$ $Γ_{3}$ is the remaining $(J-2K) \times K$ $(J - 2 K) \times K$ part. This condition significantly relaxes the "two pure children per latent" requirement (which corresponds to $\Gamma_1 = \Gamma_2 = I_K$ $Γ_{1} = Γ_{2} = I_{K}$ ).

Key Theorems (Sufficient Conditions):

Theorem 1 (Identifying $K$ ): If the true $\Gamma$ $Γ$ -matrix is double triangular, the number of latent variables $K$ $K$ is identifiable.
- Proof Idea: The double triangular structure ensures (via Lemma 1) that certain marginal probability tables involving observed variables (e.g., $P(X_{S_1}, X_{S_2})$ where $X_{S_1}, X_{S_2}$ correspond to rows of $\Gamma_1, \Gamma_2$ ) have a rank of $2^K$ . An alternative model with $K' < K$ latents would yield a rank of $2^{K'}$ , leading to a contradiction.
Definition 4 (Subset Condition): The $\Gamma$ -matrix satisfies the subset condition if for any two distinct latent variables $H_k, H_l$ , the set of observed children of $H_k$ , $Ch_X(H_k)$ , is not a subset of $Ch_X(H_l)$ , and vice-versa. This means no column in $\Gamma$ is "dominated" by another.
Theorem 2 (Identifiability of Model Components): Given a known $K$ $K$ :
- Proof Idea: Involves a three-way tensor decomposition of the observed data distribution $P(X_{S_1}, X_{S_2}, X_{S_3})$ . Kruskal's theorem guarantees the uniqueness of this decomposition under the full rank conditions provided by Lemma 1 (due to $\Gamma_1, \Gamma_2$ being triangular) and distinctness of columns in $P(X_{S_3} \mid H)$ (due to non-empty $\Gamma_3$ columns and nondegeneracy). The subset condition helps resolve ambiguities in mapping tensor components back to specific latent variable configurations.

The paper highlights that these results establish identifiability against any alternative model satisfying the basic BLCM assumptions, not just alternatives that also meet the double triangular condition. This is a stronger claim than "recoverability" often found in causal discovery literature.

Necessary Conditions for Identifiability

The paper also establishes conditions that must hold for identifiability:

Theorem 3 (Three Measurements per Latent): For a BLCM with known $K$ and categorical observed variables to be identifiable, each latent variable $H_k$ must have at least three observed children. This is stricter than the often-assumed two measurements.
Theorem 4 (Subset Condition): For a BLCM with known $K \ge 2$ and a known $\Gamma$ -matrix to be identifiable, $\Gamma$ must satisfy the subset condition (Definition 4). This is crucial for disentangling latent variable influences and resolving sign-flipping ambiguities.

Counterexamples are provided to illustrate why models violating these necessary conditions (or the nondegeneracy assumptions) become non-identifiable.

Experimental Validation

Simulations were conducted to empirically verify the theoretical identifiability results.

Setup: A BLCM with $K=3$ latent variables and $J=8$ observed variables (structure from Figure 1, which satisfies the double triangular condition). Observed variables included binary and continuous (Normal, Cauchy) types. Different latent graph structures ( $\Lambda$ ) were tested: linear, collider, and fully dependent.
Method: A score-based estimator was used. Continuous responses were dichotomized, and a penalized EM algorithm (adapted from Ma et al., 2023) was employed to estimate $\Gamma$ and $P(H)$ . $\Lambda$ was then learned from the estimated $P(H)$ using Greedy Equivalence Search (GES).
Results: The estimated $\Gamma$ and $\Lambda$ (measured by Structured Hamming Distance, SHD) showed good accuracy, which improved with increasing sample sizes ( $N=1000, 5000, 10000$ ). This supports the claim that complex latent structures satisfying the proposed weak conditions can be accurately recovered from data.

Comparison with Related Work (from Supplement S.1)

Kivva et al. (2021): Allows arbitrary categorical latents but relies on a strong "mixture oracle" assumption (to determine the number of mixture components in marginals), which may not hold in the paper's general nonparametric setting. Their tensor decomposition also differs, using log number of mixture components.
Chen et al. (2024): Handles polytomous latents but requires three pure children per latent, stricter cardinality constraints on observed vs. latent variables, and stronger full-rankness nondegeneracy.
Lee et al. (2025): Focuses on binary latents with parametric GLM assumptions for $P(X_j \mid H)$ and typically requires two pure children for strict identifiability, or achieve generic identifiability under GLMs without pure children. This paper offers nonparametric identifiability with weaker structural assumptions (double triangular instead of pure children).

Practical Implications and Implementation

The paper's theoretical results have significant practical implications:

Broader Applicability: The relaxed conditions allow for the identification of more complex and realistic causal models where latents may not have pure children and observed data is multi-modal.
Educational Assessment: As an example, student skills (binary latent) can be identified from diverse test responses (observed) even if questions test multiple skills.
Algorithm Development: The identifiability proofs (especially the tensor decomposition aspect) can guide the development of new learning algorithms. While the paper uses a penalized EM approach for experiments, the theoretical framework using Kruskal's theorem suggests other potential algorithmic avenues.
Implementation Considerations:
- Verifying Double Triangular Condition: For a given $K$ , this involves searching for specific sub-structures in $\Gamma$ . This might be computationally intensive for large $K, J$ but can be approached greedily.
- Tensor Decomposition: Practical application of Kruskal's theorem requires robust tensor decomposition algorithms, which can be challenging with finite noisy data. The experimental algorithm sidesteps direct tensor decomposition for estimation.
- Estimating K: While Theorem 1 shows $K$ is identifiable, practical estimation of $K$ often relies on model selection criteria (BIC, etc.) or domain knowledge.
- Computational Requirements: The penalized EM algorithm used in experiments is iterative. Complexity will depend on $N, J, K$ , and the number of EM iterations.

Discussion and Future Work

The authors suggest several avenues for future research:

Extending beyond measurement models to include direct causal links between observed variables.
Generalizing identifiability conditions to other types of latent variables (e.g., general categorical, continuous, or mixed).
Developing more sophisticated nonparametric causal discovery algorithms specifically tailored to these weaker identifiability guarantees, potentially leveraging the full information in continuous responses without dichotomization.

In summary, this paper makes a valuable contribution by significantly weakening the conditions required for identifying latent variable causal models with binary latents. The proposed "double triangular" condition on the latent-to-observed graph, combined with standard assumptions, provides a theoretical foundation for learning more complex causal structures from diverse types of observed data, opening up new possibilities for causal inference in various application domains.

PDF Markdown

Tweets

YouTube

Show All Videos