Causal Gaussian Process for Active Learning

Updated 4 January 2026

Causal Gaussian processes are nonparametric Bayesian models that integrate Gaussian process priors with structural causal models to capture nonlinear causal relationships.
They utilize closed-form posterior distributions and Monte Carlo techniques to evaluate intervention strategies based on expected information gain.
The framework employs GP-UCB optimization for continuous intervention selection, enabling efficient active learning of causal structures with quantified uncertainty.

A causal Gaussian process (GP) is a nonparametric Bayesian framework for modeling, inference, and active experimental design in causal inference problems where relationships between variables are non-linear and potentially complex. In this context, a causal GP integrates the structure of a directed acyclic graph (DAG) encoding the data-generating process with flexible, function-space priors on each variable's mechanism. This enables data-driven learning of both the network’s structure and the functional forms governing each node, with rigorous quantification of uncertainty and principled selection of informative interventions.

1. Structural Causal Model with Gaussian Process Priors

The foundation is a structural causal model (SCM) on real-valued variables $X_1,\ldots,X_d$ , represented as a DAG $G$ with additive noise and non-linear mechanisms: $X_i = f_i\bigl(\mathrm{Pa}_i^G\bigr) + \varepsilon_i, \qquad \varepsilon_i \sim \mathcal{N}(0,\sigma_i^2)$ where $\mathrm{Pa}_i^G$ denotes the parents of $X_i$ in $G$ . Each causal mechanism $f_i$ is assigned a GP prior

$f_i \sim \mathcal{GP}(m_i(\cdot), k_i(\cdot,\cdot))$

Commonly, $m_i \equiv 0$ and squared-exponential kernels are adopted: $k_i(x,x') = \lambda_i \exp\Big(-\sum_{h=1}^{|\mathrm{Pa}_i|} \nu_{i,h}(x_h - x_h')^2 \Big)$ where $\lambda_i$ is the signal variance and $\nu_{i,h}$ are inverse lengthscales. Given standard Gaussian likelihood, the marginal likelihood and posterior predictive distributions for each $f_i$ have closed forms. For observed $(\mathrm{pa}_i^{(n)}, x_i^{(n)})_{n=1}^N$ , the log marginal likelihood is

$\log p(\mathbf x_i | \mathbf P_i) = -\tfrac{1}{2} \mathbf x_i^\top (K_i + \sigma_i^2 I)^{-1}\mathbf x_i - \tfrac{1}{2} \log|K_i + \sigma_i^2 I| - \tfrac{N}{2}\log 2\pi$

where $(K_i)_{mn}=k_i(\mathrm{pa}_i^{(m)}, \mathrm{pa}_i^{(n)})$ . Posterior mean/variance are: $\mu_i(\mathrm{pa}_*) = k_i(\mathrm{pa}_*, \mathbf P_i)(K_i+\sigma_i^2 I)^{-1} \mathbf x_i,\quad \sigma^2_i(\mathrm{pa}_*) = k_i(\mathrm{pa}_*,\mathrm{pa}_*) - k_i(\mathrm{pa}_*, \mathbf P_i)(K_i+\sigma_i^2 I)^{-1} k_i(\mathbf P_i,\mathrm{pa}_*)$ This construction yields a fully nonparametric SCM in which both graph probabilities and all functional uncertainties are analytically tractable (Kügelgen et al., 2019).

2. Bayesian Active Learning via Expected Information Gain

A key innovation is the formalization of optimal experimental design in causal structure learning. The goal is to select interventions $\do(X_j=x)$ that maximally reduce uncertainty about the causal graph $G$ , as quantified by expected information gain (EIG): $\EIG(j,x) = \mathbb{E}_{\mathbf{X}_{-j} \sim p(\mathbf{X}_{-j}|\mathcal{D},\do(X_j=x))}\Big[ \mathrm{KL}(p(G|\mathcal{D},\mathbf{X}_{-j},\do(X_j=x)) \Vert p(G|\mathcal{D})) \Big]$ where $\mathcal{D}$ denotes existing data and $\mathbf{X}_{-j}$ is all variables except $X_j$ . In practice, since the integration is over a continuous domain, a Monte Carlo approximation is deployed wherein samples $\mathbf{x}_{-j}^{(m)}\sim p(\mathbf{X}_{-j}|G,\do(X_j=x))$ under each candidate graph $G$ are drawn by ancestral sampling using the GP posteriors of each child node (Kügelgen et al., 2019).

3. Optimization over Continuous Interventions Using GP-UCB

Crucially, interventions need not be restricted to a finite discrete set: the intervention value $x$ for $\do(X_j = x)$ lies in a continuous domain. The EIG objective $f_j(x)$ , as estimated via Monte Carlo, is treated as a black-box function and maximized via a Bayesian optimization algorithm—specifically, Gaussian Process Upper Confidence Bound (GP-UCB). In this framework:

A surrogate GP is placed on $f_j(x)$ .
The next query is

$x_{t+1} = \arg\max_{x \in \mathcal{X}_j} \Big[ \mu_t(x) + \beta_t \sigma_t(x) \Big]$

where $\mu_t(x)$ and $\sigma_t(x)$ are the GP posterior mean and standard deviation after $t$ evaluations, and $\beta_t$ is an exploration parameter.

This process is run for each $j$ , and the intervention $(j^*, x^*)$ with maximal $f_j(x_j^*)$ is selected for experimentation.

Bayesian optimization using GP-UCB enjoys sublinear regret bounds with respect to the best arm under mild smoothness assumptions, yielding highly efficient discovery of informative interventions (Kügelgen et al., 2019).

4. Algorithmic Workflow and Computational Considerations

The overall algorithm proceeds as follows:

Initialize: prior P(G), empty dataset D.
For t = 1…T do
  1. Update GP hyperparameters (e.g., type-II ML), compute P(G | D).
  2. For each j = 1…d:
    • Build a BO surrogate for f_j(x) ≔ MonteCarloEIG(j, x; D).
    • Run GP-UCB to find x_j* ≈ argmax_x f_j(x).
    • Record v_j = f_j(x_j*).
  3. Pick (j*, x*) = argmax_j v_j.
  4. Perform experiment do(X_{j*} = x*). Observe x_{-j*} ∼ p(\mathbf{X}_{-j*} | do(X_{j*} = x*)).
  5. Augment D ← D ∪ { (j*, x*, x_{-j*}) }.
End for
Output posterior P(G|D) and all GP posteriors.

Closed-form updates and Monte Carlo ancestral sampling permit rapid computation in moderate dimensions. For large

d

, exhaustive graph enumeration is impractical and MCMC over DAG space is necessary.

5. Theoretical and Empirical Properties

The causal GP framework possesses several important theoretical properties:

Exact updates to posteriors over graph and functional uncertainties due to closed-form expressions for GP marginal likelihoods.
Provably no-regret intervention optimization via GP-UCB.
Exponential complexity in the number of variables $d$ for full graph enumeration, necessitating scalable alternatives for large-scale problems.

Empirically, in a canonical bivariate setting ( $d=2$ ) with ground-truth model $Y = 2\tanh(X) + \epsilon$ , the active scheme started from few observational samples, alternated interventions on $X$ and $Y$ chosen by BO, and recovered the correct causal direction $X\to Y$ with $>99\%$ posterior confidence after only ten interventions (Kügelgen et al., 2019).

6. Significance and Impact

Causal Gaussian Process frameworks unify nonparametric causal modeling, principled uncertainty quantification, and optimal experimental design in continuous domains. They permit active learning of causal structure in settings where nonlinear, non-Gaussian mechanisms may govern variable relationships, far beyond the scope of traditional linear or discretized causal discovery. The EIG-based intervention strategy and GP surrogates for optimization are foundational for modern active causal learning protocols.

Their application spans causal structure learning, functional mechanism estimation, and design of interventions in scientific, engineering, and healthcare domains where interventions may be continuous-valued and experimental resources are limited. The active causal GP paradigm is central to ongoing developments in theory and scalable computation for structure learning under uncertainty.

Extensions include multi-task causal Gaussian processes for joint learning of responses to multiple interventions (Aglietti et al., 2020), causal GPs for nonparametric functional inference in panel data (Vega et al., 7 Jul 2025), and Bayesian optimization using causal effect posteriors for targeted experimentation. Causal GPs are also foundational in frameworks that integrate observational and interventional data, combine with kernel-based matching structures for doubly robust estimation (1901.10359), and allow for the handling of latent confounding via hierarchical Bayesian models with structured latent variables (Witty et al., 2020). The main Causal GP structure—nonparametric structural equations with GPs, explicit EIG-based design, and Bayesian optimization for interventions—remains central to all these developments.