Generalised Variational Inference in DeepONets
- Generalised variational inference is a method that replaces standard KLD with tunable Rényi’s α-divergence, balancing mode-seeking and mass-covering behaviors.
- Embedding Bayesian neural networks in both branch and trunk architectures enables calibrated uncertainty quantification and enhanced predictive accuracy.
- The framework shows robust performance in mechanical operator tasks by reducing NMSE and NLL through flexible hyperparameter tuning of α.
The Alpha-VI DeepONet framework is an operator learning methodology that integrates generalised variational inference (GVI) into DeepONets using Rényi’s -divergence instead of the standard Kullback–Leibler divergence (KLD). By embedding Bayesian neural networks (BNNs) as branch and trunk networks, the approach enables calibrated uncertainty quantification in operator learning. This divergence choice increases robustness to prior misspecification and allows for flexible control over the trade-off between data fidelity and regularisation, with the robustness hyperparameter tunable for optimal performance in different problem settings. The framework has been evaluated on a suite of mechanical operator learning tasks, demonstrating improved predictive accuracy and uncertainty quantification relative to both deterministic and KLD-based variational DeepONets.
1. Divergence Choice: Rényi’s -Divergence versus KLD
In traditional variational inference, the Kullback–Leibler divergence is minimized between the variational posterior and the target posterior. The KLD’s mode-seeking character and high sensitivity to prior misspecification may result in overconfident or biased posteriors under model misspecification. Rényi’s -divergence is defined as:
This divergence interpolates between mass-covering behavior (for ), enhancing robustness and uncertainty quantification, and mode-seeking behavior (for ), which prioritizes fitting data in high-density regions. The limit recovers the standard KLD. The ability to select provides a mechanism to mitigate prior misspecification, adapting the inference procedure to the uncertainty characteristics or multimodality of the underlying operator learning task (Knoblauch et al., 2019).
2. Bayesian Neural Networks for Uncertainty Quantification
The Alpha-VI DeepONet replaces deterministic branch and trunk networks with Bayesian neural networks—specifically, BNNs with mean-field Gaussian priors over weights. Each forward pass samples a set of weights from the variational posterior, producing not only predictive means but also predictive variances. For example, for an input , the network outputs parameters and for a predictive distribution:
This output structure enables meaningful uncertainty quantification in predictions, which is evaluated via negative log-likelihood (NLL) and normalized mean squared error (NMSE) against held-out data. The result is a probabilistic, rather than point estimate, characterization of operator outputs.
3. Variational Objective and Performance Metrics
The variational free energy (the objective for training) is generalized from the standard VI formulation:
where denotes the BNN weights, the variational parameters, and the data. The first term encourages fidelity to data (here, via Gaussian or problem-specific operator likelihoods), while the second enforces regularity with respect to the prior, tempered by .
Tabled results (see paper’s Table:NMSE and Table:NLL) show that tuning can substantially reduce NMSE and NLL compared to both KLD-based VI () and deterministic DeepONets. As a concrete example, for the antiderivative operator, the best-performing (1.25) achieved an NMSE reduction of over 50% relative to , with improvements also observed in NLL calibration.
| Problem | Best | NMSE Improvement | NLL Improvement | 
|---|---|---|---|
| Antiderivative | 1.25 | >50% | Significant | 
| Diffusion–Reaction | 2.0 | Substantial | Substantial | 
| Advection–Diffusion | 0.5 | Notable | Notable | 
4. Applications: Mechanics and Physical Systems
The framework is evaluated across a set of operator learning problems in mechanics and physics:
- Antiderivative Operator: Predicts an integral transformation of input functions, capturing not only the mean but credible intervals for uncertainty, with lower error and calibrated NLL using tuned .
- Gravity Pendulum: For , the framework’s predictive mean and uncertainty quantification improve upon KLD-based baselines.
- Diffusion–Reaction PDE: Uncertainties are spatially and temporally sensitive, reflecting solution roughness, with increased predictive reliability.
- Advection–Diffusion: Tuning to 0.5 yields better mass-covering and reduces both NMSE and NLL in transport-dominated regimes.
These results imply that adaptive divergence selection is essential for matching posterior mass concentration to the operator task’s inherent uncertainty structure—e.g., mass-covering for multimodal or smooth predictive distributions, mode-seeking when sharp predictions are critical.
5. Hyperparameterization and Robustness Tuning
The flexibility of the framework stems from the tunable hyperparameter. As established in robust variational inference theory (Knoblauch et al., 2019):
- : Mode-seeking, emphasizing best-fitting regions and down-weighting regularization.
- : Mass-covering, suited to ill-posed or multimodal inference tasks.
Optimal is problem-dependent and determined via standard cross-validation; for example, optimizes the antiderivative operator, while is preferable for advection–diffusion. This adaptability addresses the prior misspecification that frequently affects Bayesian operator learning.
6. Methodological and Application Impact
Alpha-VI DeepONet demonstrates that generalised variational inference, with flexible divergences and BNN-based operator neural architectures, leads to both improved predictive accuracy and credible uncertainty estimates in operator learning with physical and mechanical systems. The framework is validated across ODE and PDE settings, consistently outperforming deterministic and KLD-based approaches in NMSE and NLL, with results confirming greater robustness to prior misspecification and model uncertainty. This suggests that generalised divergences, especially when adaptively tuned, can be strategically leveraged to enhance real-world operator learning in fluid mechanics, structural systems, and complex scientific computing scenarios.
7. Prospects for Further Research
Potential future avenues include:
- Integration of more expressive variational families (e.g., normalizing flows) to surpass mean-field limitations and better capture complex posteriors.
- Development of more efficient Monte Carlo estimation methods for -divergences to mitigate the computational expense at extreme values.
- Application to broader scientific domains (e.g., aerospace, climate, biological modeling) where robust and calibrated operator learning is critical.
- Exploration of automated or adaptive -selection schemes during training to further enhance calibration and predictive performance for diverse problem regimes.
The Alpha-VI DeepONet thus constitutes a notable advancement in operator learning by fusing robust theoretical developments from generalised variational inference with practical neural operator architectures for uncertainty quantification in the sciences (Knoblauch et al., 2019, Lone et al., 1 Aug 2024).