Multi-Modal Bayesian Neural Surrogates

Updated 29 September 2025

Multi-modal Bayesian neural network surrogate models are probabilistic frameworks that fuse diverse data modalities to deliver calibrated uncertainty in expensive simulations and inverse problems.
They employ both joint and hierarchical architectures with conjugate last-layer estimation to enhance predictive accuracy and reduce variational approximation errors.
The models robustly handle missing data and partial observations using Gaussian conditional imputation, proving effective in Bayesian optimization, sensor fusion, and engineering applications.

Multi-modal Bayesian neural network (BNN) surrogate models are probabilistic machine learning frameworks that enable uncertainty-aware emulation of expensive simulations or complex data-generating processes by leveraging information from multiple data modalities or sources. These models provide a unified Bayesian framework for multi-modal data fusion, scalable surrogate modeling, calibrated predictive uncertainty, and learning in high dimensions or under partial observability. The development of multi-modal BNN surrogates addresses critical challenges in modern optimization, inverse problems, and scientific computing where observations are diverse, expensive, or incomplete, and where principled uncertainty quantification is essential.

A multi-modal BNN surrogate model combines standard BNN principles—Bayesian inference over weights, closed-form predictive uncertainty, and full probabilistic treatment of outputs—with architectures capable of ingesting and fusing multiple data sources or modalities. Consider a regression setting with inputs $x$ and outputs $y$ (quantity of interest), augmented by $M$ auxiliary modalities $y^{(m)}$ for $m=1,\ldots,M$ (possibly low-fidelity simulations, measurements, or alternative sensors).

Two main architectures emerge:

Joint Model: Concatenate all available outputs into a vector $y' = [y^{\top}, (y^{(1)})^{\top}, \ldots, (y^{(M)})^{\top}]^{\top}$ and train a BNN surrogate mapping $x \mapsto y'$ . This setup captures joint dependencies among all modalities, with a single BNN parameterizing the full vector-valued mapping.
Layered/Hierarchical Model: Predict auxiliary modalities $y^{(m)}$ separately via individual surrogates, then use these predictions as additional input features for the main surrogate that predicts $y$ . Mathematically, the predictive model can be written as $y = f_{\text{main}}(x, \{\hat{y}^{(m)}\}) + \epsilon$ , where each $\hat{y}^{(m)} = f_{\text{aux}}^{(m)}(x)$ is itself a BNN surrogate. This enables the main surrogate to exploit both input features and learned representations from auxiliary sources, mirroring the multi-fidelity and hierarchical Bayesian modeling paradigm.

For both architectures, the generic BNN is defined layerwise by:

$z_0 \triangleq x \ z_k = \sigma \left( s_k \cdot \frac{1}{\sqrt{h_{k-1}}} W_{k-1} z_{k-1} + b_{k-1} \right),\quad k=1,..,\ell \ \text{NN}_\theta(x) = \frac{1}{\sqrt{h_\ell}} W_\ell z_\ell + b_\ell$

with weight matrices $W_k$ , biases $b_k$ , nonlinearities $\sigma(\cdot)$ (e.g., ReLU, tanh), layer widths $h_k$ , and scaling constants $s_k$ .

2. Conjugate Last-Layer Estimation and Stochastic Variational Inference

A central innovation is the hybridization of flexible BNN feature extraction with conditionally conjugate last-layer Bayesian linear regression. Defining $\Phi$ as the BNN’s hidden weights, the final layer predicts:

$y \,\sim\, \mathcal{N}\left(b + \frac{1}{\sqrt{h_\ell}} W^\top z_\ell(x;\Phi),\, \Sigma\right)$

This admits a conjugate prior:

$[b, W] \mid \Sigma \sim \text{MatrixNormal}(0, \Lambda_0, \Sigma)$
$\Sigma^{-1} \sim \text{Wishart}(\nu_0, V_0)$

Stochastic variational inference (SVI) is employed such that:

All hidden-layer parameters ( $\Phi$ ) are optimized with a variational posterior $Q'(\Phi)$ ,
The last-layer weights and noise covariance are integrated out analytically via their conditional posterior given hidden activations,
The overall approximation is:

$Q(W, b, \Sigma, \Phi) = Q'(\Phi)\, p(W, b, \Sigma \mid \Phi, D)$

This “conditionally conjugate” SVI approach reduces variational approximation errors, accelerates convergence, and gives closed-form uncertainty quantification for the last layer.

3. Handling Missing Modalities and Partial Observations

A common complication in multi-modal surrogate modeling is incomplete observation: not all modalities $y^{(m)}$ are observed for every input $x$ . The framework accommodates missing data by:

Partitioning response vectors $Y$ into observed $(Y_{\text{obs}})$ and missing $(Y_{\text{miss}})$ blocks,
Using properties of Gaussian conditionals to impute $Y_{\text{miss}}$ given $Y_{\text{obs}}$ and the last-layer hidden features $Z$ ,
Augmenting the variational approximation to jointly include $Y_{\text{miss}}$ , resulting in a tractable update for the posterior:

$Q(W, b, \Sigma, \Phi, Y_{\rm miss}) = Q'(\Phi) \, p(W, b, \Sigma \mid \Phi, X, Y_{\text{obs}}, Y_{\text{miss}}) \, p(Y_{\text{miss}} \mid Y_{\text{obs}}, \Phi, X)$

This permits coherent uncertainty propagation in the presence of partial auxiliary observations, with all conditional expectations computed analytically.

4. Performance Metrics and Empirical Evaluation

Model assessment proceeds via systematic benchmarks across scalar and time-series outputs, as well as practical engineering datasets:

Prediction Bias: Measured by the $L_2$ norm between the predicted posterior mean and the true function values, for both interpolation and extrapolation domains,
Standardized Error: Defined by normalizing prediction error by posterior predictive variance, with calibration indicated by a mean value near unity,
Uncertainty Quantification: Evaluated through coverage probabilities, log pointwise predictive densities, and miscalibration area on calibration plots,
Empirical findings: On the Branin, Paciorek, wind data, and synthetic time series tasks, multi-modal BNN surrogates (especially hierarchical/layered models) achieve lower bias and better-calibrated uncertainty than unimodal surrogates.

A summary table illustrates the evaluation:

Model Type	Prediction Bias	Standardized Error
Uni-modal BNN	Higher	Often >1 (miscalibr.)
Joint Multi-modal	Lower	~1 (well-calibrated)
Layered Multi-modal	Lowest	~1 (best calibration)

These results confirm that incorporating auxiliary modalities via either joint or layered architectures improves both predictive accuracy and uncertainty quantification—especially when auxiliary sources are informative (as diagnosed via canonical correlation coefficients).

5. Applications and Use Cases

Applications of multi-modal BNN surrogate models include:

Bayesian Optimization: Multi-modal surrogates enable more accurate modeling and uncertainty estimation for expensive black-box functions observed with multi-fidelity or heterogeneous data (Lin et al., 2018, Chugh, 2022).
Inverse Problems: Hierarchical surrogates efficiently combine rapid low-fidelity predictions with scarce high-fidelity observations, yielding calibrated posteriors at reduced computational cost (Yan et al., 2019, Kerleguer et al., 2023).
Time-Series and Sensor Fusion: Joint and layered BNNs can encode dependencies between high-dimensional time-series or spatial fields and sparse scalar/auxiliary observations (e.g., wind field fusion, environmental monitoring).
Uncertainty Quantification: Conjugate last-layer inference, alongside robust variational approximations, ensures that predictive intervals and out-of-distribution detection (critical for robust engineering design) are calibrated.

6. Theoretical and Computational Considerations

Key mathematical properties and computational features include:

Modular architecture accommodates arbitrary numbers of modalities and naturally extends to multi-output surrogates,
Conjugate last-layer exploitation lowers optimization burden, focusing stochastic variational updates on deep representation learning rather than output uncertainty calibration,
Closed-form conditional posteriors and predictive distributions allow for principled imputation and tractable MC or analytic marginalization of prediction uncertainty,
The approach is scalable to large parameter spaces and high data dimensionality, provided SVI is used for hidden layers and analytic integration for the last layer.

Potential limitations arise when the joint statistical informativeness of the auxiliary modalities is weak (low canonical correlation), in which case layered models revert to unimodal baseline performance.

7. Outlook and Significance

Multi-modal BNN surrogate models with conjugate last-layer estimation represent an advance in uncertainty-aware machine learning for expensive data regimes, where leveraging diverse data sources is critical for decision-making under uncertainty. The methodology unifies deep learning-based feature extraction, Bayesian hierarchical modeling, scalable variational inference, and robust missing data handling. Empirical studies demonstrate improved prediction accuracy and reliable uncertainty quantification compared to unimodal surrogates, with immediate relevance to scientific, engineering, and optimization domains where multi-source data and computational efficiency are paramount (Taylor et al., 26 Sep 2025).