Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Causal Discovery from Heterogeneous/Nonstationary Data with Independent Changes (1903.01672v5)

Published 5 Mar 2019 in cs.LG and stat.ML

Abstract: It is commonplace to encounter heterogeneous or nonstationary data, of which the underlying generating process changes across domains or over time. Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper, we develop a framework for causal discovery from such data, called Constraint-based causal Discovery from heterogeneous/NOnstationary Data (CD-NOD), to find causal skeleton and directions and estimate the properties of mechanism changes. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a method to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. After learning the causal structure, next, we investigate how to efficiently estimate the "driving force" of the nonstationarity of a causal mechanism. That is, we aim to extract from data a low-dimensional representation of changes. The proposed methods are nonparametric, with no hard restrictions on data distributions and causal mechanisms, and do not rely on window segmentation. Furthermore, we find that data heterogeneity benefits causal structure identification even with particular types of confounders. Finally, we show the connection between heterogeneity/nonstationarity and soft intervention in causal discovery. Experimental results on various synthetic and real-world data sets (task-fMRI and stock market data) are presented to demonstrate the efficacy of the proposed methods.

Citations (200)

Summary

  • The paper presents CD-NOD, which uses independent changes in causal modules to reveal true causal structures from nonstationary data.
  • It details a three-phase process combining constraint-based skeleton estimation, kernel dependency tests for causal direction, and KPCA for visualizing mechanism changes.
  • Experimental results on synthetic and real-world datasets demonstrate improved accuracy in identifying both causal skeletons and edge orientations.

The paper "Causal Discovery from Heterogeneous/Nonstationary Data with Independent Changes" (Causal Discovery from Heterogeneous/Nonstationary Data with Independent Changes, 2019) introduces CD-NOD, a framework for causal discovery designed specifically for data where the underlying data-generating process changes across different domains or over time. Traditional causal discovery methods often assume a fixed causal model and may fail in such heterogeneous or nonstationary environments, potentially leading to the inference of spurious causal links or incorrect directions. CD-NOD leverages the very feature of distribution shift to aid in causal discovery, treating it as potentially informative.

The core idea is that changes in the joint distribution across different conditions (domains or time points, represented by a surrogate variable CC) can be attributed to changes in the individual causal mechanisms (conditional distributions of a variable given its direct causes). If these mechanism changes are independent, they provide strong cues about the underlying causal structure. The paper introduces a Pseudo Causal Sufficiency Assumption, which posits that any unobserved confounders causing related changes in mechanisms can be represented as deterministic functions of the domain or time index CC.

CD-NOD is presented in three phases:

Phase I: Changing Causal Module Detection and Causal Skeleton Estimation

This phase focuses on identifying which variables have changing causal mechanisms and recovering the undirected graph (skeleton) of the causal structure over the observed variables V\mathbf{V}. The key insight is to include the surrogate variable CC in the causal discovery process.

  1. A complete undirected graph is built on the variable set VC\mathbf{V} \cup C.
  2. For each variable ViVV_i \in \mathbf{V}, the method tests for marginal and conditional independence between ViV_i and CC given subsets of other variables. If ViCV_i \perp C conditional on some subset of V{Vi}\mathbf{V} \setminus \{V_i\}, the edge between ViV_i and CC is removed. Variables remaining adjacent to CC are deemed to have changing causal modules.
  3. For every pair Vi,VjVV_i, V_j \in \mathbf{V}, the method tests for marginal and conditional independence between ViV_i and VjV_j given subsets of (V{Vi,Vj}){C}(\mathbf{V} \setminus \{V_i, V_j\}) \cup \{C\}. If independence is found, the edge between ViV_i and VjV_j is removed. The paper proves that under the assumptions, two variables ViV_i and VjV_j are not adjacent in the true causal graph over V\mathbf{V} if and only if they are conditionally independent given some subset of {Vkki,kj}{C}\{V_k \mid k \neq i, k \neq j\} \cup \{C\}. This justifies the use of constraint-based search algorithms (like PC) on the augmented set VC\mathbf{V} \cup C. A nonparametric conditional independence test, such as the Kernel-based Conditional Independence (KCI) test, is essential for this step due to the potentially complex relationship between variables and CC. This phase also naturally aligns with the principle of minimal changes, identifying the smallest set of mechanisms that need to change to explain the data heterogeneity.

Phase II: Causal Direction Determination

This phase leverages the information in distribution shifts to orient the edges in the skeleton. The core principle exploited is the Independent Changes Principle: if ViVjV_i \rightarrow V_j and there are no confounders for this relationship, then the causal mechanism P(Vi)P(V_i) and P(VjVi)P(V_j | V_i) are expected to change independently across values of CC. This independence tends to be violated for the reverse direction.

  1. Generalization of Invariance: If an edge connects a variable VkV_k adjacent to CC (a changing module) and a variable VlV_l not adjacent to CC (a stationary module), the triple CVkVlC - V_k - V_l can often be oriented using standard V-structure rules, treating CVkC \rightarrow V_k. If VlCV_l \perp C given a set excluding VkV_k, it suggests VkVlV_k \leftarrow V_l. If Vl⊥̸CV_l \not\perp C and VlCV_l \perp C given a set including VkV_k, it suggests VkVlV_k \rightarrow V_l. This is essentially testing for invariance of P(Vl)P(V_l) or P(VlVk)P(V_l|V_k) using conditional independence tests with CC.
  2. Independently Changing Modules: For adjacent variables Vk,VlV_k, V_l both adjacent to CC, their direction is determined by comparing the dependence between hypothetical causal modules. If VkVlV_k \rightarrow V_l, the method compares the dependence between P(Vk)P(V_k) and P(VlVk)P(V_l|V_k) across different CC values to the dependence between P(Vl)P(V_l) and P(VkVl)P(V_k|V_l) across different CC values. The direction with lower dependence between modules is preferred.
    • To implement this nonparametrically, the paper proposes representing changing (conditional) distributions P(YX,C=c)P(Y|X, C=c) using a novel kernel embedding technique. A "virtual" joint distribution P~(Y,XC=c)=P(YX,C=c)P(X)\tilde{P}(\underline{Y}, X | C=c) = P(Y|X,C=c)P(X) is constructed, and its kernel embedding is estimated from the entire dataset without windowing (Proposition 1).
    • The dependence between the sequences of embeddings of the hypothetical modules (e.g., {P(VkC=cn)}n=1N\{P(V_k|C=c_n)\}_{n=1}^N and {P(VlVk,C=cn)}n=1N\{P(V_l|V_k, C=c_n)\}_{n=1}^N) is measured using an extended version of the Hilbert-Schmidt Independence Criterion (HSIC). Gram matrices of the embeddings are computed efficiently using kernel tricks.
    • The method iteratively identifies causal directions among variables with changing modules, considering deconfounding sets to account for common causes influencing the module dependence. The paper provides identifiability conditions (Theorem 2) for orienting edges using CD-NOD, noting that some edges may remain unoriented if these conditions are not met, resulting in a CD-NOD equivalence class.

Phase III: Nonstationary Driving Force Estimation

After learning the causal structure and identifying changing modules, this phase focuses on visualizing how the mechanisms change. The goal is to find a low-dimensional representation λi(C)\lambda_i(C) of the changing conditional distribution P(ViPAi,C)P(V_i | \mathrm{PA}^i, C) for each variable ViV_i with a changing module.

  1. The estimated kernel embeddings of the changing conditional distributions (computed in Phase II) are used.
  2. Kernel Principal Component Analysis (KPCA) is applied to the Gram matrix of these embeddings.
  3. The principal components derived from KPCA provide a low-dimensional representation of the changes in the causal module across the values of CC. This method, termed Kernel Nonstationary Visualization (KNV), avoids explicit modeling of the functional form of change and does not require windowing.

Extensions

  • Time-Varying Lagged and Instantaneous Relations: The framework can be extended by reorganizing time series variables into blocks representing different time points. The same constraint-based approach with CC can then be applied to this augmented set of variables to recover both lagged and instantaneous connections. Lagged connections (past \rightarrow future) are naturally oriented, while instantaneous ones use Phase II methods.
  • Stationary Confounders: The paper discusses via examples how distribution shifts can still help distinguish causal directions even in the presence of unobserved stationary confounders by inducing specific patterns of dependence between modules.
  • Combination with Functional Causal Models: For edges that remain unoriented after applying CD-NOD's methods, traditional functional causal model approaches (like Additive Noise Models) can be applied, potentially conditioning on inferred deconfounding variables from the CD-NOD process.

Relation to Soft Intervention

The paper highlights the connection between data heterogeneity/nonstationarity and soft interventions. Heterogeneity/nonstationarity can be seen as the result of natural "soft interventions" mediated by the variable CC, which influences the conditional distributions of some variables without necessarily breaking incoming causal links. The CD-NOD framework is more general than existing soft intervention methods, as it can automatically detect which variables are "intervened" upon, handles continuous changes in the intervention strength (via CC), and allows for multiple and pseudo-interventions (confounders modeled by CC).

Implementation Considerations

  • The use of kernel methods (KCI-test, kernel embeddings, KPCA) provides nonparametric flexibility but comes with O(N3)O(N^3) computational complexity for NN samples, which can be a bottleneck for very large datasets. Faster approximate kernel methods may be necessary.
  • Choosing kernel hyperparameters is crucial and may require cross-validation or other data-driven approaches.
  • The threshold α\alpha for detecting pseudo confounders in Phase II needs to be carefully chosen.

Experimental Results

The paper validates CD-NOD on both synthetic and real-world datasets.

  • Synthetic Data: Experiments show that CD-NOD (Phase I) significantly improves skeleton estimation accuracy (higher F1 and precision) compared to standard constraint-based methods and linear-system-based methods (IB, MC), which struggle with spurious edges induced by related changes. CD-NOD (Phase II) also shows superior direction identification performance. KNV (Phase III) effectively recovers underlying changing components, outperforming linear GP models and change point detection methods, especially for smooth changes.
  • Real-World Data:
    • Task fMRI: CD-NOD is applied to task fMRI data. It identifies time-varying causal modules in brain regions related to visual and language processing. The estimated graph structure aligns with neuroscientific understanding. The KNV driving forces show distinct patterns corresponding to resting states and task states, demonstrating the method's ability to capture meaningful changes.
    • Stock Returns: Applied to daily stock returns from Hong Kong and the US markets. CD-NOD identifies time-varying modules in stocks from sectors known to be sensitive to external economic factors (finance, energy, etc.). The inferred causal structures show plausible relationships (e.g., major banks causing others). The KNV driving forces exhibit changes that align with the timing of the 2008 financial crisis and other significant market events (like Hang Seng Bank's restructuring), suggesting they capture relevant market dynamics.

In conclusion, CD-NOD provides a powerful and practical framework for causal discovery in settings with heterogeneous and nonstationary data. By explicitly modeling the effects of distribution shifts through a surrogate variable CC and leveraging the principle of independent changes in causal modules, it can more accurately recover causal skeletons and orient edges compared to methods that assume stationarity or linearity. The proposed KNV method further allows for the visualization and analysis of the estimated "driving forces" behind the observed changes in causal mechanisms.