Causality-DeepONet: Causal Convolution Models

Updated 20 January 2026

Causality-DeepONet is a modeling paradigm that integrates causal process convolution with deep operator networks to represent systems with dynamic dependencies and spatio-temporal phenomena.
It enforces causality by restricting outputs to rely solely on present and past inputs, resulting in unique spectral behaviors and enhanced physical realism.
Structured variational inference and sampling techniques are employed to achieve scalable learning and reliable uncertainty quantification in complex, high-dimensional applications.

Causality-DeepONet refers to a research area and modeling paradigm at the intersection of causal signal modeling and deep operator networks under the process convolution framework. It centers on the representation, inference, and learning of systems whose outputs are generated by causal convolution operations—i.e., outputs depend only on present or past stochastic inputs, not future ones. These models, particularly the Causal Gaussian Process Convolution Model (CGPCM) and its multi-stage generalizations, provide a rigorously grounded approach to encoding dynamic causal dependencies, rich spectral structures, and spatio-temporal phenomena.

1. Mathematical Foundations of Causal Convolution Modeling

The fundamental construct is a process generated by convolving white noise with a causal filter. The single-output CGPCM is formalized by

$f(t) = \int_{0}^{\infty} h(\tau) w(t-\tau) d\tau$

where $w(t)$ is Gaussian white noise and $h(\tau)$ is a causal filter with support on $[0, \infty)$ . The prior on $h(\tau)$ itself is typically a GP, inducing a doubly nonparametric model for $f(t)$ (Bruinsma et al., 2018). The induced covariance function for the output is

$k_f(t, t') = \int_0^{\infty} k_h(\tau, \tau + |t - t'|) d\tau$

(expressed for stationary $k_h$ ).

Multi-stage and chain constructions extend this principle to arbitrary depth. In the "process convolution chain" formalism, an initial innovation process (e.g., white noise) is sequentially smoothed by kernels: $f^{(L)}(t) = \int G_L(t - \tau_{L-1}) \cdots \int G_1(\tau_1 - \tau) w(\tau) d\tau_1 \cdots d\tau_{L-1} d\tau$ creating an effective kernel $\widetilde G_L$ that encodes composite mechanistic structure (Scharf et al., 2017).

2. Causality Constraints and Spectral Implications

Causality is enforced by imposing the support $h(t) = 0$ for $t < 0$ , which restricts the convolution to only present and past innovations. This causal restriction provokes a host of unique statistical behaviors:

Spectral Bias: The one-sidedness of the filter implies that the induced power spectral density $S_f(\omega) = |H(\omega)|^2$ exhibits nontrivial phase responses, enabling complex spectral shapes and asymmetrical log-spectra unavailable to symmetric (acausal) kernels like SE or Matérn (Bruinsma et al., 2018, Bruinsma et al., 2022).
Differentiability and Path Roughness: The differentiability of the sample path $f(t)$ depends critically on $h(0)$ . If $h(0) \neq 0$ , $f(t)$ is nowhere differentiable; if $h(0) = 0$ , it is differentiable. Thus, causal convolution models can interpolate between smooth and locally Brownian signals (Bruinsma et al., 2022).
Physical Realism: Causal kernels preclude instantaneous propagation of innovation, aligning the model with physical time’s arrow and allowing precise modeling of delays and attenuations fundamental in many scientific systems.

3. Inference Frameworks: Variational Approaches and Structured Sampling

Inference in Causality-DeepONet models typically leverages structured variational methods:

Mean-field ELBO: Factorized Gaussian approximations over inducing variables for both the causal filter and the innovation process—enabling closed-form coordinate ascent updates for the variational posterior (Bruinsma et al., 2018).
Structured Mean-field and Gibbs Sampling: To mitigate undercalibrated uncertainties intrinsic to mean-field, a Gibbs sampler samples alternately from conditional Gaussians for filter and excitation (inducing variables), thus attaining sharper posteriors and risking less variational bias (Bruinsma et al., 2022).
Hyperparameter Learning: ELBO maximization proceeds via stochastic or automatic differentiation gradients passed through the linear algebra solving steps of variational coordinate ascent (e.g., the “∇-through-solve” method) (Bruinsma et al., 2018).
Multi-stage Chains and Block-Metropolis: In multi-stage chain models, block updates and collapse of continuous latent GPs are employed to ensure tractable mixing and scalable computations in the presence of high-dimensional latent structures (Scharf et al., 2017).

4. Generalizations: Multi-Output, Spatio-Temporal, and Composite Chains

The Causality-DeepONet formalism admits substantial generalization:

Multi-Output: Vector-valued extensions employ shared and idiosyncratic convolution kernels for each output component, supporting heterogeneous correlations and cross-covariance structures required in joint modeling of dependent components (Sofro et al., 2017, Yue et al., 2019).
Spatio-Temporal Models: By applying causal convolution kernels in both space and time, models address non-separable dependencies in spatio-temporal systems (e.g., remote sensing, animal movement). Infinite-dimensional state-space representations and Galerkin truncations allow for finite-dimensional surrogate inference (Zhang et al., 1 Dec 2025).
Process Convolution Chains: Sequential application of interpretable smoothing kernels enables decompositions where each stage models a different physical or statistical mechanism (e.g., inertia, social interaction, spatial diffusion) within a unified chain, generalizing basic CGPCM and enhancing interpretability and modeling flexibility (Scharf et al., 2017).

5. Empirical Results and Applications

Empirical demonstrations on synthetic and real-world time series highlight the efficacy of causality-enforced convolution models:

Predictive Performance: In time series with built-in delays or echoes, the CGPCM outperforms acausal GPCM and standard kernels. For example, in AR(2) and sawtooth-plus-noise tasks, CGPCM achieves lower RMSE and matches parametric causal kernels in fidelity (Bruinsma et al., 2018).
Uncertainty Quantification: Causal models avoid future leakage, yielding interval estimates that respect epistemic constraints in dynamic prediction—critical in domains such as signal processing and industrial sensor fault detection (Bruinsma et al., 2018).
Spatio-Temporal Monitoring: In remote sensing and wildfire plume tracking, convolution-generated models coupled with state-space representations offer robust tools for anomaly detection via tracking high-energy modes in first derivatives (Zhang et al., 1 Dec 2025).
Multivariate Poisson Regression: Convolved GP regression effectively models dependent count data with complex cross-correlation, as demonstrated in epidemiological mortality prediction, supporting both accurate estimation and flexible covariance specification (Sofro et al., 2017).
Structured Survival Modeling: MGCP-Cox frameworks extend convolution modeling to joint longitudinal-survival inference, achieving scalability and regularization through inducing-point variational approximations (Yue et al., 2019).

6. Computational Complexity and Implementation

Computational strategies balance model richness with tractability:

Reduced-Rank Approximation: Discretization of convolution integrals onto grids of knots reduces cubic costs to $O(n m^2 + m^3)$ , critical for scalability (Scharf et al., 2017).
Block Structure Exploitation: Symmetries and support properties of convolution kernels (e.g., spatial vs temporal smoothing) allow block-diagonal linear algebra and further reductions to near-linear cost (Scharf et al., 2017, Zhang et al., 1 Dec 2025).
Kalman Filtering for State-Space Reduction: Galerkin projections translate infinite-dimensional GPs to finite-dimensional state-space models, supporting efficient Kalman filter-based likelihood computation and error bounding (Zhang et al., 1 Dec 2025).
Variational Inference Algorithms: ELBO maximization, parameter estimation and prediction are implemented via alternating closed-form updates and gradient-based steps, exploiting the low-rank structure induced by convolution and sparse induction (Yue et al., 2019).

7. Future Directions and Theoretical Significance

Potential extensions outlined include:

Generalization to Non-Gaussian Likelihoods: Poisson, Bernoulli, and other observation models via the same inducing-point variational framework (Bruinsma et al., 2018).
Spatio-Causal Operators and Hybrid Models: Embedding CGPCM or process convolution chain priors within latent-force models or coupled with known physical PDEs for system identification and control (Bruinsma et al., 2018).
Operator Learning: The synthesis of causal convolution GPs and operator networks forms a conceptual basis for deep operator learning in dynamic, stochastic, and spatio-temporal systems—suggesting applications in process modeling, real-time anomaly detection, and uncertainty-aware forecasting.

Causality-DeepONet thus comprises a rigorously founded class of models for causal, convolution-generated phenomena, supporting both theoretical investigation and practically effective inference, with broad applicability across signal processing, time series analysis, spatio-temporal modeling, and joint multivariate outputs. These frameworks unify causal constraints, process convolution theory, variational learning, and state-space approximations—paving the way for detailed mechanistic modeling and scalable, uncertainty-aware data analysis.