Latent Causal Diffusion Model

Updated 29 December 2025

LacaDM is a generative framework that integrates structural causal models with denoising diffusion to support precise interventions, counterfactual inference, and robust generalization.
Its architecture, including CausalDiffAE and node-wise models, encodes high-dimensional data into disentangled latent variables for controlled abduction–action–prediction processes.
Empirical results highlight LacaDM’s superior performance with high disentanglement scores and improved predictive accuracy in tasks such as image generation and multiobjective reinforcement learning.

A Latent Causal Diffusion Model (LacaDM) integrates structural causal models (SCMs) with denoising diffusion probabilistic models (DDPMs), enabling interventions and counterfactual inference in high-dimensional domains such as images and sequential decision-making. This paradigm is instantiated across several recent lines of research, including causal autoencoding for counterfactual image generation, node-wise diffusion for general causal queries, and temporal-causal latent embeddings for multiobjective reinforcement learning. LacaDM unifies diffusion models’ generative capacity with explicit latent variables aligned to causal semantics, supporting precise, disentangled interventions and facilitating generalization to unseen environments.

1. Conceptual Foundations

Latent Causal Diffusion Models ground their generative and inferential procedures in SCM theory, pairing each variable (or group of variables) with a causal latent (often representing exogenous noise). The forward diffusion process degrades data into a latent form hypothesized to capture or approximate exogenous variables of the SCM, while the reverse process stochastically reconstructs the observable from latents, conditioned on the learned causal structure. This construction enables direct sampling (observational, interventional, and counterfactual), intervention via do-operations on the latent space, and counterfactual estimation by abduction–action–prediction across a broad class of domains (Komanduri et al., 27 Apr 2024, Chao et al., 2023).

2. Architectural Design Principles

There are two principal architectural schemes for LacaDM:

CausalDiffAE-style latent causal space models: High-dimensional data (e.g., images) are encoded via $q_\phi(u|x_0)$ to exogenous noise, mapped through a neural SCM $z=f(u;A,\nu)$ where $A$ is a causal adjacency matrix. Each latent coordinate $z_i$ is given by $z_i = f_i(z_{\mathrm{pa}(i)};\nu_i) + u_i$ , with SCM structure and per-node functions parameterized by neural nets. A conditional DDIM or DDPM decoder reconstructs or generates data conditioned on the causally structured $z$ , supporting hard interventions and counterfactual generation. Disentanglement is enforced using a label-aligned prior $p(z_i|y_i)$ (Komanduri et al., 27 Apr 2024).
Node-wise/counterfactually synchronous LacaDM (DCM) architectures: Each variable $X_i$ in the SCM is associated with a local diffusion model whose latent $Z_i$ (proxy for $U_i$ ) is inferred via a noising/denoising process, usually with parent values conditioned at each step. Training is performed via a “simple” DDPM loss over all nodes, and counterfactuals are produced by replacing components in topological order according to the abduction-action-prediction paradigm (Chao et al., 2023). Reverse inference is parameterized by independent MLPs over each node, conditioned on parent and time information.
Multiobjective RL (MORL) LacaDM implementations: Latent causal variables $z_t$ encode temporal-stochastic structure over agent–environment interaction, with causal dependencies modeled over previous $L$ time steps. Policy trajectories are diffused forward and then denoised by conditioning on these temporal latent variables, enabling policy adaptation and transfer in complex, shifting environments (Yan et al., 22 Dec 2025).

3. Mathematical Formulation

A generic LacaDM comprises:

Forward diffusion (encoding): Progressive corruption of an observed variable (or set) $x_0$ into latent noise $u$ via

$x_t = \sqrt{\alpha_t} x_0 + \sqrt{1-\alpha_t} \epsilon_t,\quad\epsilon_t\sim\mathcal N(0,I)$

for $t=1,\ldots,T$ . For discrete data, analogous Bernoulli or categorical transition kernels are used (Yan et al., 22 Dec 2025).

Causal mechanism (SCM): Each causal latent is generated according to

$z_i = f_i(z_{\mathrm{pa}(i)};\nu_i) + u_i$

where $f_i$ is a (typically small) neural network parameterized by $\nu_i$ , and $z_{\mathrm{pa}(i)}$ denotes parents under a known DAG.

Reverse diffusion (decoding): Reconstructs or generates data conditional on $z_{\mathrm{causal}}$ , e.g.,

$x_{t-1} = \sqrt{\alpha_{t-1}}\frac{x_t - \sqrt{1 - \alpha_t} \epsilon_\theta(x_t, t, z_{\mathrm{causal}})}{\sqrt{\alpha_t}} + \sqrt{1-\alpha_{t-1}}\epsilon_\theta(x_t, t, z_{\mathrm{causal}})$

implemented as deterministic or stochastic DDIM/DDPM steps (Komanduri et al., 27 Apr 2024, Chao et al., 2023, Yan et al., 22 Dec 2025).

Learning objectives: CausalDiffAE combines the classic DDPM mean squared error (score-matching) loss with KL divergences on the exogenous and causal latents (i.e., $\beta$ -VAE style regularization):

$\mathcal L_{\mathrm{CausalDiffAE}} = \mathcal L_{\mathrm{simple}} + \gamma \left[ \mathrm{KL}(q_\phi(u|x_0)\,\|\,\mathcal N(0, I)) + \mathrm{KL}(q_\phi(z_{\mathrm{causal}}|x_0, y)\,\|\,p(z_{\mathrm{causal}}|y)) \right]$

(Komanduri et al., 27 Apr 2024).

Interventions: Hard (atomic) interventions $do(z_k = c)$ or $do(X_k = \gamma_k)$ replace the corresponding SCM equation by a fixed constant, and only affect children in the DAG.
Counterfactual inference (abduction–action–prediction): Involves encoding observed data to recover latent noise (abduction), interventional assignment of one or more latents (action), then reconstructing observable variables under this intervention by decoding (prediction) (Komanduri et al., 27 Apr 2024, Chao et al., 2023).

4. Algorithms and Inference Procedures

Generic LacaDM inference is unified under a three-stage algorithmic scheme:

Step	Description	Implementation Highlights
Abduction	Infer exogenous noise from observation	Run forward diffusion or encoder
Action (intervention)	Modify latent(s) per desired intervention	Replace $z_k$ with $c$ in the code
Prediction (generation)	Decode/interpolate counterfactual using SCM	Run reverse diffusion, conditional or guided

Algorithmic instantiations vary:

Node-wise LacaDM: Encode each node (and parents) independently, perform interventions in topological order, decode each descendant using its distinct denoiser (Chao et al., 2023).
CausalDiffAE: Jointly encode $u$ from $x_0$ , compute all descendants’ $z_i$ via neural SCM, perform interventions by replacing $z_i$ per action set, then decode reconstructed/counterfactual image via conditioned DDIM steps (Komanduri et al., 27 Apr 2024).
MORL LacaDM: Infer temporal-causal latent trajectories $z_t$ from observed (policy, reward) pairs; conduct reverse diffusion to recover optimal or counterfactual policies under modified $z_t$ (Yan et al., 22 Dec 2025).

Pseudocode for counterfactual generation in CausalDiffAE and node-wise LacaDM is explicitly given in (Komanduri et al., 27 Apr 2024) and (Chao et al., 2023). MORL LacaDM pseudocode for training and inference is provided in (Yan et al., 22 Dec 2025).

5. Empirical Results and Comparative Performance

Empirical results across domains consistently highlight three themes:

Disentanglement and qualitative causal semantics: CausalDiffAE achieves state-of-the-art DCI scores (DCI~0.99) on MorphoMNIST and other benchmarks, outperforming CausalVAE and DiffAE, while producing interpretable causal interventions where changes propagate along the correct DAG structure (Komanduri et al., 27 Apr 2024).
Counterfactual accuracy: In both image and generic tabular settings, LacaDM achieves low mean absolute error (MAE) or MSE versus baselines for interventional and counterfactual queries. For example, on MorphoMNIST, CausalDiffAE yields MAE $\approx$ 0.39–0.50 (vs. CausalVAE: 3.76–13.23, DisDiffAE: 0.38–0.79) under relevant interventions; node-wise LacaDM achieves lowest MMD/MSE on synthetic DAGs and real fMRI data (Komanduri et al., 27 Apr 2024, Chao et al., 2023).
Multiobjective RL: LacaDM surpasses all tested baselines (e.g., DQN, PCN, various multiobjective algorithms) in hypervolume (HV), sparsity, and expected utility maximization across MOGymnasium’s suite of tasks. Ablation confirms the necessity of causal latent modeling for generalization and transfer (Yan et al., 22 Dec 2025).

6. Extensions and Theoretical Analysis

LacaDM frameworks have been extended to semi-supervised and weakly supervised settings, using classifier-free guidance (score interpolation) to control intervention strength and improve robustness to missing labels, as in CausalDiffAE (Komanduri et al., 27 Apr 2024). Temporal extensions for reinforcement learning employ causal disentanglement to distinguish environmental shifts from agent error, using normalizing flows to model exogenous noise and facilitate transfer (Yan et al., 22 Dec 2025).

Theoretical analysis (node-wise LacaDM) demonstrates that, under invertibility and independence assumptions, diffusion encoders recover true exogenous noise up to bijection, providing explicit bounds on counterfactual estimation error (Chao et al., 2023). Further, gradient-based anti-causal intervention schemes allow for guided reverse diffusion targeting minimal, plausible counterfactuals, as in Diff-SCM (Sanchez et al., 2022).

7. Relationship to Broader Causal and Diffusion Modeling Literature

LacaDM subsumes and extends prior generative causal modeling approaches (e.g., CausalVAE, additively separable SCMs) by leveraging diffusion models’ expressive capacity and stochasticity. By disentangling and regularizing latent variables to match explicit causal semantics, these models overcome the interpretability and control limitations typical of standard diffusion probabilistic models.

The LacaDM approach is aligned with recent efforts in counterfactual inference, anti-causal guidance for image modeling, and representation learning with explicit intervention support. It establishes new empirical and theoretical baselines for causality-aware generative modeling across vision, tabular, and sequential decision-making domains (Komanduri et al., 27 Apr 2024, Chao et al., 2023, Yan et al., 22 Dec 2025, Sanchez et al., 2022).