Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

121 tokens/sec

GPT-4o

9 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Identifiability Problem for Interventions

Updated 30 June 2025

The identifiability problem for interventions defines whether causal effects in time series can be computed using only observational data and summary causal graphs.
It involves determining a common backdoor set that blocks all confounding paths across every detailed full-time causal graph consistent with the abstracted model.
Efficient algorithms enable researchers to verify graphical conditions and construct valid adjustment sets, ensuring reliable causal inference from complex data.

The identifiability problem for interventions in empirical and statistical causal inference asks whether the total effect of an intervention—such as the causal effect of setting a variable (or a subset of variables) to specific values—can be determined from observed data. The challenge is compounded in time series domains where only coarse abstractions of the true underlying causal system, such as summary causal graphs (SCGs), are available, rather than full, time-resolved causal graphs. Recent research has advanced the understanding of identifiability in this more abstracted context by establishing necessary and sufficient graphical and algorithmic criteria for the existence of a common backdoor set, enabling practical identifiability by adjustment in time series modeled by SCGs.

1. Formalization of the Identifiability Problem for Interventions

Identifiability of interventions involves determining if the causal effect

$P\left(\bigl(y^j_{t_j}\bigr)_j \mid \bigl(do(x^i_{t_i})\bigr)_i\right)$

can be written as a do-free formula—that is, using only observational data, without assuming knowledge of all underlying time-indexed causal relations. In time series, this means identifying whether the effect of manipulating certain variables at specified times on a set of outcomes can be computed from the observable joint distribution and the abstraction of the system, namely the summary causal graph (SCG), rather than the full time-indexed graph.

The solution to this problem is of both theoretical and practical importance, as it determines which cause-effect queries can be credibly answered from complex time series data, when only partial or high-level domain knowledge is available.

2. Summary Causal Graphs and Their Distinction from Full-Time Causal Graphs

A summary causal graph (SCG) is a graphical abstraction:

Nodes: Each node corresponds to an entire measurable time series (e.g., a variable like temperature, not indexed at a particular time).
Edges: A directed edge $X \to Y$ exists if at any time or lag, there is a direct causal effect from $X$ to $Y$ in the true but unobserved full-time causal graph (FTCG).

This abstraction discards the explicit timing and lag information present in an FTCG, aggregating over all possible lags:

FTCG: Vertices are time-indexed variables (e.g., $X_t$ ), allowing for precise modeling of temporal causality, including lags and instantaneous feedbacks.
SCG: Captures only the presence or absence of modular cause-effect relations between series, with no attachment to specific lags or time points. SCGs might have cycles and self-loops even when the full graphs are acyclic.

This makes SCGs easier to elicit from experts or data, but the resulting coarsening introduces new challenges for identifiability, as many distinct detailed causal processes can correspond to the same SCG.

3. Common Backdoor Set: Definition and Role in Adjustment-Based Identifiability

A common backdoor set is a set of variables in the SCG that enables adjustment for confounding in all possible compatible detailed full-time causal graphs. It generalizes the classic backdoor criterion:

Definition: A set $\mathcal{Z} \subseteq \mathcal{V}^s$ (nodes of the SCG) is a common backdoor set for interventions $\{x^i_{t_i}\}_i$ and outcomes $\{y^j_{t_j}\}_j$ if $\mathcal{Z}$ satisfies the backdoor criterion for every possible FTCG compatible with the SCG.
Mathematical property: For every FTCG consistent with SCG, $\mathcal{Z}$ blocks all backdoor (i.e., non-causal) paths from any intervention to any outcome and contains no descendant (in any such FTCG) of any intervention variable.

When such a set exists, identifiability by adjustment holds. The causal query can be answered with the standard backdoor adjustment formula: $P\bigl((y^j_{t_j})_j \mid (do(x^i_{t_i}))_i\bigr) = \sum_z P\bigl((y^j_{t_j})_j \mid (x^i_{t_i})_i, z\bigr)P(z)$ where the sum/integral is over the values of all variables in $\mathcal{Z}$ .

The existence of a common backdoor set ensures that (i) the adjustment is valid for all plausible underlying dynamical causal structures consistent with the available abstraction, and (ii) causal effects can be estimated exclusively from observational data.

4. Necessary and Sufficient Conditions for Identifiability by Common Backdoor

The main theoretical result establishes necessary and sufficient graphical conditions under which identifiability by common backdoor is possible in SCGs for time series interventions.

Let $CD$ (cone of descendants) denote the union, over all possible FTCGs compatible with the SCG, of the descendants of each intervention (after removing all other interventions). Define the non-conditionable set as $NC = CD \setminus \mathrm{Interventions}$ .

Theorem (Summarized):

The effect $P(y_t \mid do(x^1_{t-\gamma_1}), \ldots, do(x^n_{t-\gamma_n}))$ is identifiable by a common backdoor in the SCG if and only if, for every intervention and every FTCG compatible with the SCG, there is **no collider-free backdoor path from any intervention to any outcome that remains entirely within the cone of descendants (NC).

That is, all unblocked, non-causal, non-collider-containing paths through variables that could possibly be descendants of interventions must be blockable by adjustment. If any such path exists in any compatible FTCG, identifiability fails.

When the condition holds, any set outside the cone of descendants (besides interventions themselves) is a valid universal adjustment set.

Practical algorithmic implication: The identification criterion and adjustment set construction can be checked and implemented efficiently (pseudo-linear time) without enumerating all possible underlying FTCGs, by combinatorially analyzing the SCG and accessible descendant sets.

5. Efficient Algorithms for Deciding and Constructing Adjustment Sets

The paper introduces efficient algorithms for determining identifiability by common backdoor and, when possible, constructing a valid adjustment set:

Step 1: Compute the earliest time each series enters the non-conditionable set ( $t_{NC}(F)$ for each series $F$ ); this is done via dynamic programming/traversal of the SCG.
Step 2: For each variable and each intervention, check, using graph traversal, if there is a directed path in the cone of descendants allowing unblocked backdoor connectivity between interventions and outcomes.
Step 3: If any such path is found, identifiability fails. Otherwise, the complement of the non-conditionable set provides the adjustment set.
Complexity: The algorithms run in

$\mathcal{O}\left(n \cdot (|\mathcal{E}^s| + |\mathcal{V}^s| \log |\mathcal{V}^s|)\right)$

with $n$ interventions, and $|\mathcal{E}^s|$ and $|\mathcal{V}^s|$ the edge and node counts of the SCG.

The algorithms use only the structure of the SCG and do not require explicit consideration or construction of the exponentially large set of compatible FTCGs.

6. Broader Impact and Applications

These results provide:

Complete graphical characterizations for identifiability by adjustment via common backdoor in time series domains where only summary graphs are available.
Scalable algorithms for practitioners to certify when and how total intervention effects (possibly with multiple interventions and multiple outcomes) can be expressed via adjustment from observational data.
Extensions to practical domains, including medicine, monitoring and diagnosis of complex dynamic systems, economics, and any field relying on observational time series data with partially known causal structure.

This framework enables reliable, constructive answers to the question: Can causal effects of interventions in partially or abstractly specified time series be estimated from data, and what variables must be adjusted for? When the answer is negative, the algorithms constructively show why not by exhibiting the problematic path.

Feature	Description	Example/Usage
Summary Causal Graph (SCG)	Abstraction encoding existence, but not timing/lag, of causal edges between series	Nodes: series; Edges: causal links at some lag
Common Backdoor Set	Set blocking all confounding per all compatible underlying FTCGs	Used in adjustment formula for identifiability
Necessary & Sufficient Condition	No collider-free backdoor within the cone of descendants in any compatible FTCG	Immediate graphical criterion on SCG
Complexity	Pseudo-linear in graphs and interventions	Efficient in large-scale, real-world graphs
Do-free Adjustment Formula	$P(y \mid do(x)) = \sum_z P(y \mid x, z) P(z)$ , with $z$ in the common backdoor set	Causal effect computable from observed data

A major advance is the generalization of classical identifiability theory, harmonizing it with the realities of time series abstraction and offering practitioners actionable conditions and algorithms for a core task in applied causal inference.

PDF Markdown Chat (Upgrade)