- The paper introduces a novel multi-domain causal discovery approach leveraging bijective generation mechanisms to infer invariant causal directions from observational data.
- It develops a practical methodology using density-vectorization and independence tests to distinguish causal relations across varying domain distributions.
- Experimental results demonstrate superior accuracy and precision compared to existing methods, despite challenges with data requirements and computational complexity.
This paper, "Multi-Domain Causal Discovery in Bijective Causal Models" (2504.21261), addresses the problem of learning causal structure (represented as a directed acyclic graph) from observational data collected across multiple distinct domains or environments. This setting is challenging because the distributions of variables can vary significantly across domains, potentially confounding standard causal discovery algorithms that rely on distributional properties within a single dataset.
The core problem is to identify the causal graph when the causal mechanisms (the functions relating variables to their parents and noise) are invariant across domains, but the distributions of the exogenous noise can change.
The paper introduces the concept of Bijective Generation Mechanisms (BGM) as a key assumption. Under BGM, for any variable Vi and any fixed value of its parents PAi=pai, the function mapping the exogenous noise Ei to the variable Vi (i.e., fi(pai,⋅):Ei→Vi) is assumed to be a diffeomorphism (bijective and differentiable with a differentiable inverse) for continuous variables, or simply a bijection for discrete variables. This assumption is shown to generalize several common noise models, including additive noise models, LiNGAM, post-nonlinear models, and location-scale noise models, making the proposed method applicable to a wider range of real-world scenarios.
The central theoretical contribution is based on the notion of "similarity" between random variables in a multi-domain setting. Two random variables A and B are defined as "similar" (A∼B) if there exists a diffeomorphism (or bijection for discrete) g such that g(A) has the identical distribution as B across all domains. The paper proves that under the BGM assumption, if X is a cause of Y, then the conditional distributions of Y given different values of X (denoted Y~x) are pairwise similar, i.e., ∀a,b∈X:Y~a∼Y~b.
To make this testable, the paper introduces the "density-vectorization" operator ΦV(v), which takes a point v in the support of a variable V and returns a normalized vector of its density (or probability mass) across all domains at that point. This vector ΦV(v) lies on the m-dimensional simplex, where m is the number of domains. The "special random variable" ΨV is then defined as ΨV=ΦV(V), which captures the probabilistic profile of V's density across domains for a randomly sampled value of V. A crucial result (Proposition 2) is that if two random variables are similar, their corresponding special random variables are identical across all domains (ΨA=IΨB).
Building on this, the paper defines ΓX→Y:=ΨY~X. The main theoretical finding (Theorem 1) states that under the BGM assumption, the causal direction X→Y implies that the special random variable ΓX→Y is statistically independent of X across all domains ($\Gamma_{X \to Y} \indep X$). This provides a practical criterion for causal discovery: if ΓX→Y is not independent of X in at least one domain, we can reject the hypothesis that X causes Y. The extensions to discrete and multivariate cases show that this independence test can be applied to test potential parent sets S for a variable Vi by checking $\Gamma_{\mathbf{S} \to V_i} \indep \mathbf{S}$.
For practical implementation, the algorithm involves the following steps:
- Conditional Density Estimation: Estimate the conditional density (or mass) function pVi∣Si(vi∣s) for each domain i and each candidate parent set S of a variable Vi. Non-parametric methods like kernel density estimation (e.g., using the
np
package in R, as mentioned by the authors) can be used.
- Sample Generation for ΓS→Vi: For each observed data point (sk,vi,k) from domain i, construct the density vector across all domains: (pVi∣S1(vi,k∣sk),…,pVi∣Sm(vi,k∣sk)). Normalize this vector to get a sample of ΓS→Vi for that data point and domain.
- Independence Testing: For each domain i, perform a statistical test for independence between the generated samples of ΓS→Vi and the corresponding values of S. A d-variable HSIC test is suggested.
- Aggregation: Combine the p-values from the independence tests across all domains. The minimum p-value is used; if it falls below a significance threshold, the hypothesis that S is the parent set is rejected.
- Parent Set Discovery: Search over candidate subsets S of a variable's neighbors. Two heuristics are proposed: H1, which returns the union of all candidate parent sets that are not rejected (aiming for a superset of true parents), and H2, which returns the single subset with the highest minimum p-value if the size of the parent set is known or bounded (aiming for exact recovery).
Experimental results on synthetic data generated from a heteroscedastic noise model (a type of location-scale model falling under BGM) demonstrate that the proposed method (H1 and H2) outperforms existing multi-domain causal discovery algorithms like ICP, NLICP, MC, IB, LRE, CD-NOD, CdF, and MGL in terms of accuracy for bivariate cases and precision/recall/F1-score for multivariate cases. Applications to real-world CollegeDistance and Adult datasets illustrate its use in practice, identifying plausible parent variables like 'race' for educational attainment and 'country' for working hours.
Despite its strong performance and generality under BGM, the paper acknowledges several practical limitations:
- Data Requirements: Accurate density estimation, especially in high-dimensional spaces, requires a large amount of data.
- Domain Count: The method is most effective when the number of domains is relatively small compared to the data size, as the dimensionality of the Γ variable depends on the number of domains.
- Computational Cost: The search over candidate parent sets combined with density estimation and independence testing can be computationally intensive.
- Completeness: The method guarantees soundness (rejected parent sets are truly not the full parent set) but not completeness (it doesn't guarantee that all non-parent sets will be rejected).
Overall, the paper provides a novel, theoretically grounded approach for multi-domain causal discovery under the flexible BGM assumption, offering a practical method that demonstrates promising results in experimental settings.