Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Domain Causal Discovery in Bijective Causal Models (2504.21261v1)

Published 30 Apr 2025 in cs.LG, cs.AI, and stat.ME

Abstract: We consider the problem of causal discovery (a.k.a., causal structure learning) in a multi-domain setting. We assume that the causal functions are invariant across the domains, while the distribution of the exogenous noise may vary. Under causal sufficiency (i.e., no confounders exist), we show that the causal diagram can be discovered under less restrictive functional assumptions compared to previous work. What enables causal discovery in this setting is bijective generation mechanisms (BGM), which ensures that the functional relation between the exogenous noise $E$ and the endogenous variable $Y$ is bijective and differentiable in both directions at every level of the cause variable $X = x$. BGM generalizes a variety of models including additive noise model, LiNGAM, post-nonlinear model, and location-scale noise model. Further, we derive a statistical test to find the parents set of the target variable. Experiments on various synthetic and real-world datasets validate our theoretical findings.

Summary

  • The paper introduces a novel multi-domain causal discovery approach leveraging bijective generation mechanisms to infer invariant causal directions from observational data.
  • It develops a practical methodology using density-vectorization and independence tests to distinguish causal relations across varying domain distributions.
  • Experimental results demonstrate superior accuracy and precision compared to existing methods, despite challenges with data requirements and computational complexity.

This paper, "Multi-Domain Causal Discovery in Bijective Causal Models" (2504.21261), addresses the problem of learning causal structure (represented as a directed acyclic graph) from observational data collected across multiple distinct domains or environments. This setting is challenging because the distributions of variables can vary significantly across domains, potentially confounding standard causal discovery algorithms that rely on distributional properties within a single dataset.

The core problem is to identify the causal graph when the causal mechanisms (the functions relating variables to their parents and noise) are invariant across domains, but the distributions of the exogenous noise can change.

The paper introduces the concept of Bijective Generation Mechanisms (BGM) as a key assumption. Under BGM, for any variable ViV_i and any fixed value of its parents PAi=pai\mathbf{PA}_i=pa_i, the function mapping the exogenous noise EiE_i to the variable ViV_i (i.e., fi(pai,):EiVif_i(pa_i, \cdot): \mathcal{E}_i \to \mathcal{V}_i) is assumed to be a diffeomorphism (bijective and differentiable with a differentiable inverse) for continuous variables, or simply a bijection for discrete variables. This assumption is shown to generalize several common noise models, including additive noise models, LiNGAM, post-nonlinear models, and location-scale noise models, making the proposed method applicable to a wider range of real-world scenarios.

The central theoretical contribution is based on the notion of "similarity" between random variables in a multi-domain setting. Two random variables AA and BB are defined as "similar" (ABA \sim B) if there exists a diffeomorphism (or bijection for discrete) gg such that g(A)g(A) has the identical distribution as BB across all domains. The paper proves that under the BGM assumption, if XX is a cause of YY, then the conditional distributions of YY given different values of XX (denoted Y~x\tilde{Y}_x) are pairwise similar, i.e., a,bX:Y~aY~b\forall a, b \in \mathcal{X}: \tilde{Y}_a \sim \tilde{Y}_b.

To make this testable, the paper introduces the "density-vectorization" operator ΦV(v)\Phi_V(v), which takes a point vv in the support of a variable VV and returns a normalized vector of its density (or probability mass) across all domains at that point. This vector ΦV(v)\Phi_V(v) lies on the mm-dimensional simplex, where mm is the number of domains. The "special random variable" ΨV\Psi_V is then defined as ΨV=ΦV(V)\Psi_V = \Phi_V(V), which captures the probabilistic profile of VV's density across domains for a randomly sampled value of VV. A crucial result (Proposition 2) is that if two random variables are similar, their corresponding special random variables are identical across all domains (ΨA=IΨB\Psi_A \overset{I}{=} \Psi_B).

Building on this, the paper defines ΓXY:=ΨY~X\Gamma_{X \to Y} := \Psi_{\tilde{Y}_X}. The main theoretical finding (Theorem 1) states that under the BGM assumption, the causal direction XYX \to Y implies that the special random variable ΓXY\Gamma_{X \to Y} is statistically independent of XX across all domains ($\Gamma_{X \to Y} \indep X$). This provides a practical criterion for causal discovery: if ΓXY\Gamma_{X \to Y} is not independent of XX in at least one domain, we can reject the hypothesis that XX causes YY. The extensions to discrete and multivariate cases show that this independence test can be applied to test potential parent sets S\mathbf{S} for a variable ViV_i by checking $\Gamma_{\mathbf{S} \to V_i} \indep \mathbf{S}$.

For practical implementation, the algorithm involves the following steps:

  1. Conditional Density Estimation: Estimate the conditional density (or mass) function pViSi(vis)p^i_{V_i|\mathbf{S}}(v_i|\mathbf{s}) for each domain ii and each candidate parent set S\mathbf{S} of a variable ViV_i. Non-parametric methods like kernel density estimation (e.g., using the np package in R, as mentioned by the authors) can be used.
  2. Sample Generation for ΓSVi\Gamma_{\mathbf{S} \to V_i}: For each observed data point (sk,vi,k)(\mathbf{s}_k, v_{i,k}) from domain ii, construct the density vector across all domains: (pViS1(vi,ksk),,pViSm(vi,ksk))(p^1_{V_i|\mathbf{S}}(v_{i,k}|\mathbf{s}_k), \dots, p^m_{V_i|\mathbf{S}}(v_{i,k}|\mathbf{s}_k)). Normalize this vector to get a sample of ΓSVi\Gamma_{\mathbf{S} \to V_i} for that data point and domain.
  3. Independence Testing: For each domain ii, perform a statistical test for independence between the generated samples of ΓSVi\Gamma_{\mathbf{S} \to V_i} and the corresponding values of S\mathbf{S}. A d-variable HSIC test is suggested.
  4. Aggregation: Combine the p-values from the independence tests across all domains. The minimum p-value is used; if it falls below a significance threshold, the hypothesis that S\mathbf{S} is the parent set is rejected.
  5. Parent Set Discovery: Search over candidate subsets S\mathbf{S} of a variable's neighbors. Two heuristics are proposed: H1, which returns the union of all candidate parent sets that are not rejected (aiming for a superset of true parents), and H2, which returns the single subset with the highest minimum p-value if the size of the parent set is known or bounded (aiming for exact recovery).

Experimental results on synthetic data generated from a heteroscedastic noise model (a type of location-scale model falling under BGM) demonstrate that the proposed method (H1 and H2) outperforms existing multi-domain causal discovery algorithms like ICP, NLICP, MC, IB, LRE, CD-NOD, CdF, and MGL in terms of accuracy for bivariate cases and precision/recall/F1-score for multivariate cases. Applications to real-world CollegeDistance and Adult datasets illustrate its use in practice, identifying plausible parent variables like 'race' for educational attainment and 'country' for working hours.

Despite its strong performance and generality under BGM, the paper acknowledges several practical limitations:

  • Data Requirements: Accurate density estimation, especially in high-dimensional spaces, requires a large amount of data.
  • Domain Count: The method is most effective when the number of domains is relatively small compared to the data size, as the dimensionality of the Γ\Gamma variable depends on the number of domains.
  • Computational Cost: The search over candidate parent sets combined with density estimation and independence testing can be computationally intensive.
  • Completeness: The method guarantees soundness (rejected parent sets are truly not the full parent set) but not completeness (it doesn't guarantee that all non-parent sets will be rejected).

Overall, the paper provides a novel, theoretically grounded approach for multi-domain causal discovery under the flexible BGM assumption, offering a practical method that demonstrates promising results in experimental settings.