Causal Discovery from Conditionally Stationary Time Series

Published 12 Oct 2021 in cs.LG and stat.ML | (2110.06257v4)

Abstract: Causal discovery, i.e., inferring underlying causal relationships from observational data, is highly challenging for AI systems. In a time series modeling context, traditional causal discovery methods mainly consider constrained scenarios with fully observed variables and/or data from stationary time-series. We develop a causal discovery approach to handle a wide class of nonstationary time series that are conditionally stationary, where the nonstationary behaviour is modeled as stationarity conditioned on a set of latent state variables. Named State-Dependent Causal Inference (SDCI), our approach is able to recover the underlying causal dependencies, with provable identifiablity for the state-dependent causal structures. Empirical experiments on nonlinear particle interaction data and gene regulatory networks demonstrate SDCI's superior performance over baseline causal discovery methods. Improved results over non-causal RNNs on modeling NBA player movements demonstrate the potential of our method and motivate the use of causality-driven methods for forecasting.

Abstract PDF HTML Upgrade to Chat

References (46)

Citations (5)

View on Semantic Scholar

Summary

The paper introduces SDCI, a novel state-dependent TiMINo model that recovers conditional summary graphs in conditionally stationary time series.
It leverages a VAE-based framework for amortized inference to efficiently uncover state-dependent causal structures from both observed and hidden states.
Empirical results on synthetic datasets and NBA trajectories show that SDCI improves causal graph recovery and forecasting accuracy in complex non-stationary systems.

This paper introduces State-Dependent Causal Inference (SDCI), a novel approach for causal discovery in a specific class of non-stationary time series called "conditionally stationary" time series. In such series, the non-stationary behavior arises because the underlying causal dynamics change depending on a set of "state" variables. SDCI aims to recover these state-dependent causal dependencies.

Problem Addressed:

Traditional causal discovery methods for time series often assume stationarity, which is restrictive for many real-world datasets. While some methods address non-stationarity, causal discovery under mild and realistic assumptions for such data remains an open problem. This paper tackles this by focusing on conditionally stationary time series, where non-stationarity is governed by state variables.

Core Concepts and Approach:

The SDCI method is built upon the idea of discovering "conditional summary graphs" given observed sequences.

Conditionally Stationary Time Series: The dynamics of the observed system $X = \{x_1, \dots, x_N\}$ (each $x_i$ is a time series of length $T$ ) change based on state variables $s^t = \{s_1^t, \dots, s_N^t\}$ , where each $s_i^t \in \{1, ..., K\}$ is a categorical state for variable $x_i$ at time $t$ . The time series is stationary if the states are held constant.
Scenario Classes for State Observability:
- Class 1: States are observed, and their dynamics are independent of other observed time series.
- Class 2: States are unobserved and directly dependent on observed variables.
- Class 3: States depend on earlier events and cannot be directly inferred from current observations.
- Class 4: States are unknown confounders, making causal discovery ill-defined. SDCI is shown to work provably for fully-observed states (Class 1) and empirically for hidden states (Classes 2 and 3).
Conditional Summary Graph ( $\mathcal{G}_{1:K}$ ): Instead of a single summary graph, SDCI aims to learn a set of $K$ summary graphs, $\mathcal{G}_{1:K} = \{\mathcal{G}_k: 1 \leq k \leq K\}$ . Each $\mathcal{G}_k = \{\mathcal{V}, \mathcal{E}_k\}$ represents the causal structure when a variable $x_i$ is in state $k$ . An edge from $x_i$ to $x_j$ is in $\mathcal{E}_k$ if, at some time $t$ , $s_i^t=k$ and $x_i^t$ causes $x_j^{t+1}$ . This provides a more informative representation of causal structure than a single, potentially dense, summary graph for non-stationary data.
State-Dependent TiMINo: The paper extends the TiMINo (Time Series Models with Independent Noise) framework to conditionally stationary time series. Assuming a first-order Markov property, an additive noise model (ANM), and no instantaneous effects, the model is:

$x_j^t = f_j^{s^{t-1}}((PA_j^1|s^{t-1})^{t-1}) + \bm{\epsilon}_j^t$

where $PA_j^1|s^{t-1} = \{x_i: x_j \in C_i(s_i^{t-1}), 1 \leq i \leq N\}$ , and $C_i(k)$ are the children of $x_i$ when its state is $k$ .
Identifiability: The full time graph $\mathcal{G}^{1:T}$ is identifiable from the data distribution if states $S$ are observed. Consequently, the conditional summary graph $\mathcal{G}_{1:K}$ is also identifiable if all states of each element are visited at least once.
Edge-Types: The interaction $x_i \to x_j$ at time $t$ is modeled as a categorical edge-type $z_{ij}^t \in \{0, \dots, n_{\epsilon}-1\}$ (0 for "no effect"). This edge-type is determined by the state of the source variable $x_i^t$ : $z_{ij}^t = (\tilde{\mathcal{E}}_{s_i^t})_{ij}$ , where $(\tilde{\mathcal{E}}_{k})_{ij} = w_{ijk}$ is the edge type between $i$ and $j$ when $s_i^t=k$ . The goal is to learn $W = \{w_{ijk}\}$ , which represents the conditional summary graphs including edge-types.

Implementation using Variational Auto-Encoder (VAE):

SDCI uses a VAE framework for amortized inference of the conditional summary graphs.

Generative Model (Observed States $S$ ):

$p(X, W| S) = p_{\psi}(X| W,S)p(W)$

The decoder $p_{\psi}(X|W,S)$ predicts $x_j^{t+1}$ based on $x^t$ , $s^t$ , and $W$ :

$\tilde{x}_j^{t+1} = x_j^t + f_{p}\Big(\sum_{i\neq j} h_{ij}^t, x_j^t \Big)$

$h_{ij}^t = \sum_{e>0} \mathbf{1}_{(z^t_{ij}=e)} f_e(x_i^t, x_j^t)$ (message passing) where $f_e$ are learnable functions for each edge type, and $f_p$ aggregates messages.
Inference Model (Observed States $S$ ):

A variational distribution $q_{\phi}(W| X,S) = \prod_{k,i,j} q_{\phi}(w_{ijk}|X,S)$ approximates the posterior. An encoder network $f_{\phi}(X,S)$ outputs logits $\bm{\phi}_{ij} \in \mathbb{R}^{K \times n_{\epsilon}}$ . $q_{\phi}(w_{ijk}|X, S) = \text{softmax}( (\bm{\phi}_{ij})_{k} / \tau)$ , where $(\bm{\phi}_{ij})_{k}$ is the $k$ -th row corresponding to state $k$ . The Gumbel-softmax trick is used for backpropagation.
Hidden States $S$ :

The joint distribution is $p(X, W, S) = p(W)\prod_{t} p_{\psi}(x^{t+1}|x^{t}, s^{t}, W) p(s^{t+1}|x^{t+1})$ . A factorized variational approximation $q_{\phi}(W, S |X) = q_{\phi}(W |X)q_{\phi}(S | X)$ is used. $q_{\phi}(W|X)$ is similar to the observed case, but the encoder $f_{\phi}$ only takes $X$ . $q_{\phi}(S|X) = \prod_{t,i} q_{\phi}(s_i^t|x_i^t)$ , where $q_{\phi}(s_i^t|x_i^t) = \text{softmax}(\hat{f}_{s}(x_i^t)/\gamma)$ , with $\hat{f}_s$ being another neural network. Theoretical identifiability guarantees do not hold for hidden states.
Training Objective: The Evidence Lower Bound (ELBO) is maximized. $\log p(X|S) \geq \mathbb{E}_{q_{\phi}(W| X,S)}[\log p_{\psi}(X| W,S) ] - KL(q_{\phi}(W| X,S)|| p(W))$ . For hidden states, an additional expectation over $S \sim q_{\phi}(S|X)$ is taken for the reconstruction term.

Encoder Architecture:

The encoder adapts the architecture from Amortized Causal Discovery (ACD). It involves:

Embedding each node's time series: $h^1_i = f_{\phi_1}(x_i^{1:T})$ (or $f_{\phi_1}(\text{concat}(x_i^{1:T}, s_i^{1:T}))$ for observed states).
Message passing between nodes using a GNN to get updated embeddings $h^2_i$ .
Pairwise processing of $h^2_i, h^2_j$ to output logits $\bm{\phi}_{ij}$ for edge types across all $K$ states.

Experiments and Results:

SDCI was evaluated on synthetic linear data, nonlinear spring particle data, and NBA player trajectories.

Synthetic Linear Data (Scenario Class 2 - hidden states):
- SDCI outperformed baselines (TdCM, CD-NOD, SAEM, ACD) in recovering both summary graphs (SG) and conditional summary graphs (CSG).
- ACD performed well when the true causal graph was constant.
Nonlinear Spring Data:
- Scenario Class 1 (observed states): SDCI showed better edge-type identification accuracy than ACD, especially as the number of variables or states increased. Both methods were data-efficient.
- Scenario Class 2 (hidden states): SDCI had a clear advantage in SG accuracy over baselines (ACD, CD-NOD) and produced better forecasts than ACD due to more accurate graph structures.
- Scenario Class 3 (states change on collision, observed for training): SDCI performed significantly better in edge accuracy than ACD.
NBA Player Trajectories (Real-world, states designed based on court position or learned):
- SDCI outperformed ACD and a non-causal VRNN baseline in forecasting player positions.
- SDCI with hidden states (learning 2 or 4 states) performed comparably to SDCI with observed states.
- SDCI showed good data efficiency and generalization ability across different teams.
- Interpretability: The states learned by SDCI in the hidden state setting on NBA data were interpretable, corresponding to meaningful court regions (e.g., mid-court line, 3-point line) and player behaviors (e.g., offense/defense).

Conclusions and Contributions:

The paper successfully develops SDCI, a method for amortized causal discovery in conditionally stationary time series.

Key Contributions:

1. Introduction of the state-dependent TiMINo model. 2. Definition of the "conditional summary graph" as a more informative causal representation for such time series. 3. Proof of identifiability for the full time graph and conditional summary graph when states are observed. 4. A deep learning-based VAE framework for efficient, amortized inference of these graphs.

SDCI demonstrated improved accuracy in causal graph recovery and forecasting on both synthetic and complex real-world data (NBA player movements).
The results highlight the potential of causality-driven methods for improved forecasting and data interpretability in non-stationary systems.

The work provides a practical approach to handle a wider class of non-stationary time series by explicitly modeling the state-dependent nature of causal interactions.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Causal Discovery from Conditionally Stationary Time Series

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (7)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Causal Discovery from Conditionally Stationary Time Series

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (7)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research