Papers
Topics
Authors
Recent
Search
2000 character limit reached

Causal Discovery from Conditionally Stationary Time Series

Published 12 Oct 2021 in cs.LG and stat.ML | (2110.06257v4)

Abstract: Causal discovery, i.e., inferring underlying causal relationships from observational data, is highly challenging for AI systems. In a time series modeling context, traditional causal discovery methods mainly consider constrained scenarios with fully observed variables and/or data from stationary time-series. We develop a causal discovery approach to handle a wide class of nonstationary time series that are conditionally stationary, where the nonstationary behaviour is modeled as stationarity conditioned on a set of latent state variables. Named State-Dependent Causal Inference (SDCI), our approach is able to recover the underlying causal dependencies, with provable identifiablity for the state-dependent causal structures. Empirical experiments on nonlinear particle interaction data and gene regulatory networks demonstrate SDCI's superior performance over baseline causal discovery methods. Improved results over non-causal RNNs on modeling NBA player movements demonstrate the potential of our method and motivate the use of causality-driven methods for forecasting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  2. David Maxwell Chickering. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002.
  3. A recurrent latent variable model for sequential data. Advances in neural information processing systems, 28, 2015.
  4. Fast and accurate deep network learning by exponential linear units (elus). In Yoshua Bengio and Yann LeCun (eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
  5. On causal discovery from time series data using fci. Probabilistic graphical models, pp.  121–128, 2010.
  6. Deep end-to-end causal inference. arXiv preprint arXiv:2202.02195, 2022.
  7. Multi-domain causal structure learning in linear systems. Advances in neural information processing systems, 31, 2018.
  8. CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning. In ICLR, 2020.
  9. Review of causal discovery methods based on graphical models. Frontiers in genetics, 10:524, 2019.
  10. Discovering temporal causal relations from subsampled data. In International Conference on Machine Learning, pp. 1898–1906. PMLR, 2015.
  11. Rhino: Deep causal temporal relationship learning with history-dependent noise. In NeurIPS 2022 Workshop on Causality for Real-world Impact, 2022. URL https://openreview.net/forum?id=Z53CEX9jh4E.
  12. Clive WJ Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica: journal of the Econometric Society, pp. 424–438, 1969.
  13. Identification of time-dependent causal model: A gaussian process treatment. In Twenty-Fourth international joint conference on artificial intelligence, 2015.
  14. Causal discovery and forecasting in nonstationary environments with state-space models. In International Conference on Machine Learning, pp. 2901–2910. PMLR, 2019.
  15. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
  16. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
  17. Neural relational inference for interacting systems. In International Conference on Machine Learning, pp. 2688–2697. PMLR, 2018.
  18. Causal discovery in physical systems from videos. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  9180–9192. Curran Associates, Inc., 2020.
  19. Kostya Linou. NBA player movements. https://github.com/linouk23/NBA-Player-Movements, 2016. Last accessed: 2022-08-06.
  20. Amortized causal discovery: Learning to infer causal graphs from time-series data. ArXiv, abs/2006.10833, 2020.
  21. The concrete distribution: A continuous relaxation of discrete random variables. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
  22. Causal discovery with general non-linear relationships using non-linear ica. In Uncertainty in artificial intelligence, pp.  186–195. PMLR, 2020.
  23. Kevin P Murphy et al. Dynamic bayesian networks. Probabilistic Graphical Models, M. Jordan, 7:431, 2002.
  24. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  25. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR 2011, pp.  3153–3160. IEEE, 2011.
  26. Dynotears: Structure learning from time-series data. In International Conference on Artificial Intelligence and Statistics, pp.  1595–1605. PMLR, 2020.
  27. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., 2019.
  28. Judea Pearl. Causality. Cambridge university press, 2009.
  29. Identifiability of causal graphs using functional models. In Fabio G. Cozman and Avi Pfeffer (eds.), Proceedings of the 27th Annual Conference on Uncertainty in Artificial Intelligence (UAI-11), pp.  589–598. AUAI Press, 2011. URL http://uai.sis.pitt.edu/papers/11/p589-peters.pdf.
  30. Causal inference on time series using restricted structural equation models. Advances in Neural Information Processing Systems, 26, 2013.
  31. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15(58):2009–2053, 2014.
  32. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
  33. Jakob Runge. Causal network reconstruction from time series: From theoretical assumptions to practical estimation. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7):075310, 2018.
  34. Reconstructing regime-dependent causal relationships from observational time series. Chaos: An Interdisciplinary Journal of Nonlinear Science, 30(11):113115, 2020.
  35. Counterfactual generative networks. In International Conference on Learning Representations, 2021.
  36. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(10), 2006.
  37. Christopher A Sims. Macroeconomics and reality. Econometrica: journal of the Econometric Society, pp.  1–48, 1980.
  38. Core knowledge. Developmental science, 10(1):89–96, 2007.
  39. Peter Spirtes. An anytime algorithm for causal inference. In AISTATS, 2001.
  40. Causation, prediction, and search. MIT press, 2000.
  41. Neural granger causality. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.  1–1, 2021. doi: 10.1109/TPAMI.2021.3065601.
  42. Learning temporally causal latent processes from general temporal data. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=RDlLMjLJXdq.
  43. CLEVRER: collision events for video representation and reasoning. In ICLR, 2020.
  44. On the identifiability of the post-nonlinear causal model. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pp.  647–655, Arlington, Virginia, USA, 2009. AUAI Press. ISBN 9780974903958.
  45. On estimation of functional causal models: general results and application to the post-nonlinear causal model. ACM Transactions on Intelligent Systems and Technology (TIST), 7(2):1–22, 2015.
  46. Causal discovery from nonstationary/heterogeneous data: Skeleton estimation and orientation determination. In IJCAI: Proceedings of the Conference, volume 2017, pp. 1347. NIH Public Access, 2017.
Citations (5)

Summary

  • The paper introduces SDCI, a novel state-dependent TiMINo model that recovers conditional summary graphs in conditionally stationary time series.
  • It leverages a VAE-based framework for amortized inference to efficiently uncover state-dependent causal structures from both observed and hidden states.
  • Empirical results on synthetic datasets and NBA trajectories show that SDCI improves causal graph recovery and forecasting accuracy in complex non-stationary systems.

This paper introduces State-Dependent Causal Inference (SDCI), a novel approach for causal discovery in a specific class of non-stationary time series called "conditionally stationary" time series. In such series, the non-stationary behavior arises because the underlying causal dynamics change depending on a set of "state" variables. SDCI aims to recover these state-dependent causal dependencies.

Problem Addressed:

Traditional causal discovery methods for time series often assume stationarity, which is restrictive for many real-world datasets. While some methods address non-stationarity, causal discovery under mild and realistic assumptions for such data remains an open problem. This paper tackles this by focusing on conditionally stationary time series, where non-stationarity is governed by state variables.

Core Concepts and Approach:

The SDCI method is built upon the idea of discovering "conditional summary graphs" given observed sequences.

  1. Conditionally Stationary Time Series: The dynamics of the observed system X={x1,,xN}X = \{x_1, \dots, x_N\} (each xix_i is a time series of length TT) change based on state variables st={s1t,,sNt}s^t = \{s_1^t, \dots, s_N^t\}, where each sit{1,...,K}s_i^t \in \{1, ..., K\} is a categorical state for variable xix_i at time tt. The time series is stationary if the states are held constant.
  2. Scenario Classes for State Observability:
    • Class 1: States are observed, and their dynamics are independent of other observed time series.
    • Class 2: States are unobserved and directly dependent on observed variables.
    • Class 3: States depend on earlier events and cannot be directly inferred from current observations.
    • Class 4: States are unknown confounders, making causal discovery ill-defined. SDCI is shown to work provably for fully-observed states (Class 1) and empirically for hidden states (Classes 2 and 3).
  3. Conditional Summary Graph (G1:K\mathcal{G}_{1:K}): Instead of a single summary graph, SDCI aims to learn a set of KK summary graphs, G1:K={Gk:1kK}\mathcal{G}_{1:K} = \{\mathcal{G}_k: 1 \leq k \leq K\}. Each Gk={V,Ek}\mathcal{G}_k = \{\mathcal{V}, \mathcal{E}_k\} represents the causal structure when a variable xix_i is in state kk. An edge from xix_i to xjx_j is in Ek\mathcal{E}_k if, at some time tt, sit=ks_i^t=k and xitx_i^t causes xjt+1x_j^{t+1}. This provides a more informative representation of causal structure than a single, potentially dense, summary graph for non-stationary data.
  4. State-Dependent TiMINo: The paper extends the TiMINo (Time Series Models with Independent Noise) framework to conditionally stationary time series. Assuming a first-order Markov property, an additive noise model (ANM), and no instantaneous effects, the model is:

    xjt=fjst1((PAj1st1)t1)+ϵjtx_j^t = f_j^{s^{t-1}}((PA_j^1|s^{t-1})^{t-1}) + \bm{\epsilon}_j^t

    where PAj1st1={xi:xjCi(sit1),1iN}PA_j^1|s^{t-1} = \{x_i: x_j \in C_i(s_i^{t-1}), 1 \leq i \leq N\}, and Ci(k)C_i(k) are the children of xix_i when its state is kk.

  5. Identifiability: The full time graph G1:T\mathcal{G}^{1:T} is identifiable from the data distribution if states SS are observed. Consequently, the conditional summary graph G1:K\mathcal{G}_{1:K} is also identifiable if all states of each element are visited at least once.
  6. Edge-Types: The interaction xixjx_i \to x_j at time tt is modeled as a categorical edge-type zijt{0,,nϵ1}z_{ij}^t \in \{0, \dots, n_{\epsilon}-1\} (0 for "no effect"). This edge-type is determined by the state of the source variable xitx_i^t: zijt=(E~sit)ijz_{ij}^t = (\tilde{\mathcal{E}}_{s_i^t})_{ij}, where (E~k)ij=wijk(\tilde{\mathcal{E}}_{k})_{ij} = w_{ijk} is the edge type between ii and jj when sit=ks_i^t=k. The goal is to learn W={wijk}W = \{w_{ijk}\}, which represents the conditional summary graphs including edge-types.

Implementation using Variational Auto-Encoder (VAE):

SDCI uses a VAE framework for amortized inference of the conditional summary graphs.

  • Generative Model (Observed States SS):

    p(X,WS)=pψ(XW,S)p(W)p(X, W| S) = p_{\psi}(X| W,S)p(W)

    The decoder pψ(XW,S)p_{\psi}(X|W,S) predicts xjt+1x_j^{t+1} based on xtx^t, sts^t, and WW:

    x~jt+1=xjt+fp(ijhijt,xjt)\tilde{x}_j^{t+1} = x_j^t + f_{p}\Big(\sum_{i\neq j} h_{ij}^t, x_j^t \Big)

    hijt=e>01(zijt=e)fe(xit,xjt)h_{ij}^t = \sum_{e>0} \mathbf{1}_{(z^t_{ij}=e)} f_e(x_i^t, x_j^t) (message passing) where fef_e are learnable functions for each edge type, and fpf_p aggregates messages.

  • Inference Model (Observed States SS):

    A variational distribution qϕ(WX,S)=k,i,jqϕ(wijkX,S)q_{\phi}(W| X,S) = \prod_{k,i,j} q_{\phi}(w_{ijk}|X,S) approximates the posterior. An encoder network fϕ(X,S)f_{\phi}(X,S) outputs logits ϕijRK×nϵ\bm{\phi}_{ij} \in \mathbb{R}^{K \times n_{\epsilon}}. qϕ(wijkX,S)=softmax((ϕij)k/τ)q_{\phi}(w_{ijk}|X, S) = \text{softmax}( (\bm{\phi}_{ij})_{k} / \tau), where (ϕij)k(\bm{\phi}_{ij})_{k} is the kk-th row corresponding to state kk. The Gumbel-softmax trick is used for backpropagation.

  • Hidden States SS:

    The joint distribution is p(X,W,S)=p(W)tpψ(xt+1xt,st,W)p(st+1xt+1)p(X, W, S) = p(W)\prod_{t} p_{\psi}(x^{t+1}|x^{t}, s^{t}, W) p(s^{t+1}|x^{t+1}). A factorized variational approximation qϕ(W,SX)=qϕ(WX)qϕ(SX)q_{\phi}(W, S |X) = q_{\phi}(W |X)q_{\phi}(S | X) is used. qϕ(WX)q_{\phi}(W|X) is similar to the observed case, but the encoder fϕf_{\phi} only takes XX. qϕ(SX)=t,iqϕ(sitxit)q_{\phi}(S|X) = \prod_{t,i} q_{\phi}(s_i^t|x_i^t), where qϕ(sitxit)=softmax(f^s(xit)/γ)q_{\phi}(s_i^t|x_i^t) = \text{softmax}(\hat{f}_{s}(x_i^t)/\gamma), with f^s\hat{f}_s being another neural network. Theoretical identifiability guarantees do not hold for hidden states.

  • Training Objective: The Evidence Lower Bound (ELBO) is maximized. logp(XS)Eqϕ(WX,S)[logpψ(XW,S)]KL(qϕ(WX,S)p(W))\log p(X|S) \geq \mathbb{E}_{q_{\phi}(W| X,S)}[\log p_{\psi}(X| W,S) ] - KL(q_{\phi}(W| X,S)|| p(W)). For hidden states, an additional expectation over Sqϕ(SX)S \sim q_{\phi}(S|X) is taken for the reconstruction term.

Encoder Architecture:

The encoder adapts the architecture from Amortized Causal Discovery (ACD). It involves:

  1. Embedding each node's time series: hi1=fϕ1(xi1:T)h^1_i = f_{\phi_1}(x_i^{1:T}) (or fϕ1(concat(xi1:T,si1:T))f_{\phi_1}(\text{concat}(x_i^{1:T}, s_i^{1:T})) for observed states).
  2. Message passing between nodes using a GNN to get updated embeddings hi2h^2_i.
  3. Pairwise processing of hi2,hj2h^2_i, h^2_j to output logits ϕij\bm{\phi}_{ij} for edge types across all KK states.

Experiments and Results:

SDCI was evaluated on synthetic linear data, nonlinear spring particle data, and NBA player trajectories.

  • Synthetic Linear Data (Scenario Class 2 - hidden states):
    • SDCI outperformed baselines (TdCM, CD-NOD, SAEM, ACD) in recovering both summary graphs (SG) and conditional summary graphs (CSG).
    • ACD performed well when the true causal graph was constant.
  • Nonlinear Spring Data:
    • Scenario Class 1 (observed states): SDCI showed better edge-type identification accuracy than ACD, especially as the number of variables or states increased. Both methods were data-efficient.
    • Scenario Class 2 (hidden states): SDCI had a clear advantage in SG accuracy over baselines (ACD, CD-NOD) and produced better forecasts than ACD due to more accurate graph structures.
    • Scenario Class 3 (states change on collision, observed for training): SDCI performed significantly better in edge accuracy than ACD.
  • NBA Player Trajectories (Real-world, states designed based on court position or learned):
    • SDCI outperformed ACD and a non-causal VRNN baseline in forecasting player positions.
    • SDCI with hidden states (learning 2 or 4 states) performed comparably to SDCI with observed states.
    • SDCI showed good data efficiency and generalization ability across different teams.
    • Interpretability: The states learned by SDCI in the hidden state setting on NBA data were interpretable, corresponding to meaningful court regions (e.g., mid-court line, 3-point line) and player behaviors (e.g., offense/defense).

Conclusions and Contributions:

The paper successfully develops SDCI, a method for amortized causal discovery in conditionally stationary time series.

  • Key Contributions:

1. Introduction of the state-dependent TiMINo model. 2. Definition of the "conditional summary graph" as a more informative causal representation for such time series. 3. Proof of identifiability for the full time graph and conditional summary graph when states are observed. 4. A deep learning-based VAE framework for efficient, amortized inference of these graphs.

  • SDCI demonstrated improved accuracy in causal graph recovery and forecasting on both synthetic and complex real-world data (NBA player movements).
  • The results highlight the potential of causality-driven methods for improved forecasting and data interpretability in non-stationary systems.

The work provides a practical approach to handle a wider class of non-stationary time series by explicitly modeling the state-dependent nature of causal interactions.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.