Partial Functional Dynamic BDCM

Updated 3 September 2025

The paper introduces a framework that combines diffusion-based generative modeling with explicit backdoor adjustments to address spatial and temporal confounders.
It employs node-specific encoder–decoder mechanisms and multi-resolution functional representations to guarantee accurate counterfactual estimations.
Empirical validations on synthetic and real-world datasets demonstrate superior performance using metrics like MSE, MMD, and CRPS compared to existing methods.

The Partial Functional Dynamic Backdoor Diffusion-based Causal Model (PFD-BDCM) is a framework for causal inference that systematically integrates diffusion-based generative modeling, region-specific structural equations, conditional autoregressive processes, and valid backdoor adjustment sets. The model is designed to address confounding bias due to unmeasured confounders exhibiting spatial heterogeneity and temporal dependency, as well as to accommodate multi-resolution functional data. PFD-BDCM provides rigorous theoretical guarantees linking reconstruction error to counterfactual estimation error and demonstrates empirical superiority over existing methods in both synthetic and real-world scenarios (Liu et al., 30 Aug 2025).

1. Model Architecture and Functional Representation

PFD-BDCM represents each endogenous node in the causal graph with a node-specific diffusion-based encoder–decoder mechanism. Each variable $X_k$ is encoded via a forward diffusion process, generating a latent representation $Z_k$ that implicitly models exogenous noise. The reverse diffusion process decodes $Z_k$ back to $X_k$ by conditioning on the designated backdoor adjustment set $X_{\mathcal{B}_k}$ .

The encoding and decoding processes are specified as: $\text{Enc-BDCM:}\quad Z_k^{(t+1)} := \sqrt{\alpha_{t+1}/\alpha_t} \cdot Z_k^t + \epsilon_\theta^k(Z_k^t, X_{\mathcal{B}_k}, t) \cdot \left[ \sqrt{1-\alpha_{t+1}} - \sqrt{(\alpha_{t+1}(1-\alpha_{t}))/\alpha_t} \right],\quad t=0,\ldots,T-1$

$\text{Dec-BDCM:}\quad \hat{X}_k^{(t-1)} := \sqrt{\alpha_{t-1}/\alpha_t} \cdot \hat{X}_k^{t} - \epsilon_\theta^k(\hat{X}_k^t, X_{\mathcal{B}_k}, t)\left[ \sqrt{\alpha_{t-1}(1-\alpha_t)/\alpha_t} - \sqrt{1-\alpha_{t-1}} \right]$

Variables observed at heterogeneous resolutions are projected via basis expansions: $X_{ijm} = \int b_{ijm}(t) X_{ij}(t) dt$ where $b_{ijm}(t)$ are orthogonal basis functions, providing multi-resolution discretization for functional data.

The structural causal graph is dynamic (PFST-DSCM), organizing nodes across spatial regions and temporal indices, with topological ordering determined to respect causality, and backdoor adjustment sets incorporated at both encoding and decoding stages.

2. Systematic Backdoor Adjustment and Diffusion-Based Sampling

PFD-BDCM systematically incorporates valid backdoor adjustment sets $\mathcal{B}_k$ for each node $k$ according to the backdoor criterion. During sampling, including for both interventional ( $do(X_l := \gamma_l)$ ) and counterfactual queries, the decoding process for each node conditions not only on parent nodes but also on the entire adjustment set identified via the graph structure. This inclusion ensures blocking of all non-causal (backdoor) paths, mitigating bias induced by confounders.

Sampling is performed via diffusion processes:

For observational queries, each node is reconstructed sequentially via its diffusion decoder, using sampled latent variables.
For interventional queries, intervened values are held fixed, and non-intervened nodes are decoded topologically, conditioning on both observed and previously decoded variables.
For counterfactual estimation, the encoder maps factual data to latent codes, which are then used in prediction under the new, interventional parent settings.

The architecture supports multi-resolution functional variables and dynamic causal graphs, permitting flexible modeling of complex, high-dimensional datasets with spatial and temporal structure.

3. Modeling of Unmeasured Confounders with Spatial and Temporal Dynamics

PFD-BDCM distinctly models two categories of unmeasured confounders:

Unobserved explanatory confounders ( $C_1$ )
Unobserved explained confounders ( $C_2$ )

The region-specific structural equations for these groups are given by: $X_{C_2, ij} = \Gamma_i \cdot X_{C_1, ij} + U_{C_2, ij}$ Here, $\Gamma_i$ is a structural coefficient matrix unique to region $i$ , and $U_{C_2, ij}$ captures residual unexplained variability.

Temporal dependencies are encoded via conditional autoregressive processes with covariance structures: $\Sigma_{X_{C_1}, i} = D_{C_1, i} \otimes T_{X_{C_1}},\quad \Sigma_{U_{C_2}, i} = D_{C_2, i} \otimes T_{U_{C_2}}$ $D_{C_h, i}$ are regional temporal adjacency matrices, and the $T$ matrices encode cross-sectional covariances. This modeling explicitly integrates unmeasured confounders into the generative process, accounting for both spatial and temporal complexity.

4. Theoretical Guarantees: Reconstruction and Counterfactual Error Bounds

The framework provides explicit theoretical analysis relating reconstruction error to the accuracy of counterfactual estimates. Assuming monotonicity of the structural equation $f$ with respect to the exogenous noise $U$ and invertibility of the encoder $g$ , the main result states: $d\left(h(g(X, X_{\mathcal{B}}), X_{\mathcal{B}}), X\right) \leq \tau \implies d\left(h(g(x^F, x_\pi^F), \gamma), f(\gamma, u)\right) \leq \tau$ where $d(\cdot, \cdot)$ is an appropriate metric (e.g., L2 norm), and $h$ is the decoder. This formalizes that reconstruction accuracy during training bounds the error in subsequent counterfactual prediction.

In the multivariate case, the result is refined via an explicit Lipschitz constant $\mathcal{L}_h$ and condition number of the encoder Jacobian: $d\left(h(g(X, X_{\mathcal{B}}), X_{\mathcal{B}}), X\right) \leq \mathcal{L}_h \cdot \text{cond}(J_g) \cdot \tau$ This analysis ensures reliability of counterfactual estimates, provided the underlying monotonicity and invertibility conditions are satisfied.

5. Empirical Performance on Synthetic and Real-World Data

PFD-BDCM has been empirically validated on synthetic spatio-temporal causal graphs with deliberately introduced spatially heterogeneous and temporally dependent unmeasured confounders. Metrics including Maximum Mean Discrepancy (MMD), Continuous Ranked Probability Score (CRPS), and Mean Squared Error (MSE) demonstrate superior performance compared to variants omitting backdoor adjustments.

In application to Chinese air pollution data (using CO₂ emission inventories and multi-pollutant datasets spanning multiple years and provinces), PFD-BDCM achieves lower discrepancy between generated and true distributions. These findings reflect improved causal fidelity when addressing confounders with explicit adjustment via functional diffusion-based modeling.

6. Applications and Methodological Significance

PFD-BDCM is directly applicable to domains such as:

Environmental science: Facilitates causal analysis of air pollutants across regions and time, managing partial observability and functional data.
Epidemiology and public health: Supports robust counterfactual inference with high-dimensional multi-resolution patient data, often exhibiting latent confounders.
Econometric and policy impact evaluation: Enables causal estimation from observational data collected at multiple heterogeneities and temporal scales.

The methodological innovation of uniting diffusion-based generative models with explicit backdoor adjustment sets and dynamic spatio-temporal causal graphs addresses limitations in existing approaches; it systematically extends the theoretical and practical toolkit for inference under partial functional specification, dynamic confounding, and high-dimensional functional data.

7. Future Directions and Implications

The integration and theoretical grounding of PFD-BDCM suggest several research avenues:

Scalable implementation in ultra-high-dimensional and functional data contexts.
Automated identification of backdoor adjustment sets within large graphs.
Real-time deployment in interactive decision support and policy analysis systems.

This model establishes a principled link between generative reconstruction quality and causal inference accuracy, offering a robust template for approaching causal queries in complex dynamic environments (Liu et al., 30 Aug 2025).

PDF Markdown Chat (Pro)

References (1)

Partial Functional Dynamic Backdoor Diffusion-based Causal Model (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Partial Functional Dynamic Backdoor Diffusion-based Causal Model (PFD-BDCM).