- The paper presents CD-NOD, which uses independent changes in causal modules to reveal true causal structures from nonstationary data.
- It details a three-phase process combining constraint-based skeleton estimation, kernel dependency tests for causal direction, and KPCA for visualizing mechanism changes.
- Experimental results on synthetic and real-world datasets demonstrate improved accuracy in identifying both causal skeletons and edge orientations.
The paper "Causal Discovery from Heterogeneous/Nonstationary Data with Independent Changes" (Causal Discovery from Heterogeneous/Nonstationary Data with Independent Changes, 2019) introduces CD-NOD, a framework for causal discovery designed specifically for data where the underlying data-generating process changes across different domains or over time. Traditional causal discovery methods often assume a fixed causal model and may fail in such heterogeneous or nonstationary environments, potentially leading to the inference of spurious causal links or incorrect directions. CD-NOD leverages the very feature of distribution shift to aid in causal discovery, treating it as potentially informative.
The core idea is that changes in the joint distribution across different conditions (domains or time points, represented by a surrogate variable C) can be attributed to changes in the individual causal mechanisms (conditional distributions of a variable given its direct causes). If these mechanism changes are independent, they provide strong cues about the underlying causal structure. The paper introduces a Pseudo Causal Sufficiency Assumption, which posits that any unobserved confounders causing related changes in mechanisms can be represented as deterministic functions of the domain or time index C.
CD-NOD is presented in three phases:
Phase I: Changing Causal Module Detection and Causal Skeleton Estimation
This phase focuses on identifying which variables have changing causal mechanisms and recovering the undirected graph (skeleton) of the causal structure over the observed variables V. The key insight is to include the surrogate variable C in the causal discovery process.
- A complete undirected graph is built on the variable set V∪C.
- For each variable Vi∈V, the method tests for marginal and conditional independence between Vi and C given subsets of other variables. If Vi⊥C conditional on some subset of V∖{Vi}, the edge between Vi and C is removed. Variables remaining adjacent to C are deemed to have changing causal modules.
- For every pair Vi,Vj∈V, the method tests for marginal and conditional independence between Vi and Vj given subsets of (V∖{Vi,Vj})∪{C}. If independence is found, the edge between Vi and Vj is removed.
The paper proves that under the assumptions, two variables Vi and Vj are not adjacent in the true causal graph over V if and only if they are conditionally independent given some subset of {Vk∣k=i,k=j}∪{C}. This justifies the use of constraint-based search algorithms (like PC) on the augmented set V∪C. A nonparametric conditional independence test, such as the Kernel-based Conditional Independence (KCI) test, is essential for this step due to the potentially complex relationship between variables and C. This phase also naturally aligns with the principle of minimal changes, identifying the smallest set of mechanisms that need to change to explain the data heterogeneity.
Phase II: Causal Direction Determination
This phase leverages the information in distribution shifts to orient the edges in the skeleton. The core principle exploited is the Independent Changes Principle: if Vi→Vj and there are no confounders for this relationship, then the causal mechanism P(Vi) and P(Vj∣Vi) are expected to change independently across values of C. This independence tends to be violated for the reverse direction.
- Generalization of Invariance: If an edge connects a variable Vk adjacent to C (a changing module) and a variable Vl not adjacent to C (a stationary module), the triple C−Vk−Vl can often be oriented using standard V-structure rules, treating C→Vk. If Vl⊥C given a set excluding Vk, it suggests Vk←Vl. If Vl⊥C and Vl⊥C given a set including Vk, it suggests Vk→Vl. This is essentially testing for invariance of P(Vl) or P(Vl∣Vk) using conditional independence tests with C.
- Independently Changing Modules: For adjacent variables Vk,Vl both adjacent to C, their direction is determined by comparing the dependence between hypothetical causal modules. If Vk→Vl, the method compares the dependence between P(Vk) and P(Vl∣Vk) across different C values to the dependence between P(Vl) and P(Vk∣Vl) across different C values. The direction with lower dependence between modules is preferred.
- To implement this nonparametrically, the paper proposes representing changing (conditional) distributions P(Y∣X,C=c) using a novel kernel embedding technique. A "virtual" joint distribution P~(Y,X∣C=c)=P(Y∣X,C=c)P(X) is constructed, and its kernel embedding is estimated from the entire dataset without windowing (Proposition 1).
- The dependence between the sequences of embeddings of the hypothetical modules (e.g., {P(Vk∣C=cn)}n=1N and {P(Vl∣Vk,C=cn)}n=1N) is measured using an extended version of the Hilbert-Schmidt Independence Criterion (HSIC). Gram matrices of the embeddings are computed efficiently using kernel tricks.
- The method iteratively identifies causal directions among variables with changing modules, considering deconfounding sets to account for common causes influencing the module dependence.
The paper provides identifiability conditions (Theorem 2) for orienting edges using CD-NOD, noting that some edges may remain unoriented if these conditions are not met, resulting in a CD-NOD equivalence class.
Phase III: Nonstationary Driving Force Estimation
After learning the causal structure and identifying changing modules, this phase focuses on visualizing how the mechanisms change. The goal is to find a low-dimensional representation λi(C) of the changing conditional distribution P(Vi∣PAi,C) for each variable Vi with a changing module.
- The estimated kernel embeddings of the changing conditional distributions (computed in Phase II) are used.
- Kernel Principal Component Analysis (KPCA) is applied to the Gram matrix of these embeddings.
- The principal components derived from KPCA provide a low-dimensional representation of the changes in the causal module across the values of C. This method, termed Kernel Nonstationary Visualization (KNV), avoids explicit modeling of the functional form of change and does not require windowing.
Extensions
- Time-Varying Lagged and Instantaneous Relations: The framework can be extended by reorganizing time series variables into blocks representing different time points. The same constraint-based approach with C can then be applied to this augmented set of variables to recover both lagged and instantaneous connections. Lagged connections (past → future) are naturally oriented, while instantaneous ones use Phase II methods.
- Stationary Confounders: The paper discusses via examples how distribution shifts can still help distinguish causal directions even in the presence of unobserved stationary confounders by inducing specific patterns of dependence between modules.
- Combination with Functional Causal Models: For edges that remain unoriented after applying CD-NOD's methods, traditional functional causal model approaches (like Additive Noise Models) can be applied, potentially conditioning on inferred deconfounding variables from the CD-NOD process.
Relation to Soft Intervention
The paper highlights the connection between data heterogeneity/nonstationarity and soft interventions. Heterogeneity/nonstationarity can be seen as the result of natural "soft interventions" mediated by the variable C, which influences the conditional distributions of some variables without necessarily breaking incoming causal links. The CD-NOD framework is more general than existing soft intervention methods, as it can automatically detect which variables are "intervened" upon, handles continuous changes in the intervention strength (via C), and allows for multiple and pseudo-interventions (confounders modeled by C).
Implementation Considerations
- The use of kernel methods (KCI-test, kernel embeddings, KPCA) provides nonparametric flexibility but comes with O(N3) computational complexity for N samples, which can be a bottleneck for very large datasets. Faster approximate kernel methods may be necessary.
- Choosing kernel hyperparameters is crucial and may require cross-validation or other data-driven approaches.
- The threshold α for detecting pseudo confounders in Phase II needs to be carefully chosen.
Experimental Results
The paper validates CD-NOD on both synthetic and real-world datasets.
- Synthetic Data: Experiments show that CD-NOD (Phase I) significantly improves skeleton estimation accuracy (higher F1 and precision) compared to standard constraint-based methods and linear-system-based methods (IB, MC), which struggle with spurious edges induced by related changes. CD-NOD (Phase II) also shows superior direction identification performance. KNV (Phase III) effectively recovers underlying changing components, outperforming linear GP models and change point detection methods, especially for smooth changes.
- Real-World Data:
- Task fMRI: CD-NOD is applied to task fMRI data. It identifies time-varying causal modules in brain regions related to visual and language processing. The estimated graph structure aligns with neuroscientific understanding. The KNV driving forces show distinct patterns corresponding to resting states and task states, demonstrating the method's ability to capture meaningful changes.
- Stock Returns: Applied to daily stock returns from Hong Kong and the US markets. CD-NOD identifies time-varying modules in stocks from sectors known to be sensitive to external economic factors (finance, energy, etc.). The inferred causal structures show plausible relationships (e.g., major banks causing others). The KNV driving forces exhibit changes that align with the timing of the 2008 financial crisis and other significant market events (like Hang Seng Bank's restructuring), suggesting they capture relevant market dynamics.
In conclusion, CD-NOD provides a powerful and practical framework for causal discovery in settings with heterogeneous and nonstationary data. By explicitly modeling the effects of distribution shifts through a surrogate variable C and leveraging the principle of independent changes in causal modules, it can more accurately recover causal skeletons and orient edges compared to methods that assume stationarity or linearity. The proposed KNV method further allows for the visualization and analysis of the estimated "driving forces" behind the observed changes in causal mechanisms.