Quantifying Uncertainty in the Presence of Distribution Shifts

Published 23 Jun 2025 in stat.ML and cs.LG | (2506.18283v1)

Abstract: Neural networks make accurate predictions but often fail to provide reliable uncertainty estimates, especially under covariate distribution shifts between training and testing. To address this problem, we propose a Bayesian framework for uncertainty estimation that explicitly accounts for covariate shifts. While conventional approaches rely on fixed priors, the key idea of our method is an adaptive prior, conditioned on both training and new covariates. This prior naturally increases uncertainty for inputs that lie far from the training distribution in regions where predictive performance is likely to degrade. To efficiently approximate the resulting posterior predictive distribution, we employ amortized variational inference. Finally, we construct synthetic environments by drawing small bootstrap samples from the training data, simulating a range of plausible covariate shift using only the original dataset. We evaluate our method on both synthetic and real-world data. It yields substantially improved uncertainty estimates under distribution shifts.

Abstract PDF Upgrade to Chat

Summary

The paper introduces VIDS, a Bayesian framework using a data-conditioned adaptive prior to quantify uncertainty under distribution shift.
VIDS is trained by generating synthetic environments via bootstrap resampling of training data, simulating covariate shifts to ensure robustness.
Experiments show VIDS yields improved accuracy and more calibrated uncertainty on diverse datasets under various distribution shift scenarios.

Quantifying Uncertainty in the Presence of Distribution Shifts

This paper addresses the critical problem of predictive uncertainty estimation for neural networks under covariate distribution shifts. The authors introduce a Bayesian framework, Variational Inference under Distribution Shift (VIDS), that adaptively quantifies uncertainty by conditioning the prior over model parameters on both training and test covariates. This approach is motivated by the observation that classical Bayesian neural networks, which use fixed priors, fail to increase predictive uncertainty for test inputs that are distant from the training distribution—a scenario frequently encountered in real-world applications such as medical diagnosis and cross-domain image classification.

Methodological Contributions

The central innovation is the introduction of an adaptive, covariate-dependent prior $p(\theta \mid x_{1:n}, x^*)$ for the model parameters $\theta$ . This prior is constructed via an energy-based formulation that incorporates both the training covariates and the test covariate, allowing the posterior predictive distribution to reflect increased uncertainty for out-of-distribution (OOD) inputs. The resulting predictive distribution is:

$p(y^* \mid x^*, x_{1:n}, y_{1:n}) = \int p(y^* \mid x^*, \theta)\, p(\theta \mid x^*, x_{1:n}, y_{1:n})\, d\theta$

To approximate the intractable posterior, the authors employ amortized variational inference. The variational posterior is parameterized as a function of the test covariate, enabling efficient uncertainty estimation for arbitrary test-time inputs. The inference network aggregates representations of the training covariates and the test covariate, and outputs the parameters of a Gaussian variational distribution over $\theta$ .

A significant practical challenge is the lack of access to true OOD test covariates during training. The authors address this by generating synthetic environments via bootstrap resampling of the training data, simulating a range of plausible covariate shifts. The variational objective is then optimized across these environments, with an additional penalty on the variance of the objective across environments to encourage robustness.

Empirical Evaluation

The paper presents comprehensive experiments on both synthetic and real-world datasets, including regression and classification tasks under various forms of covariate shift. Competing methods include SNGP, DUE, and DUL, all of which are state-of-the-art distance-aware or Bayesian uncertainty estimation techniques.

Key empirical findings include:

Synthetic Regression (Heteroscedastic Linear Model): VIDS achieves the lowest RMSE and is the only method to correctly capture the increasing predictive variance as the test covariate moves away from the training distribution.
Synthetic Classification (Logistic Regression with Missing Data): VIDS attains the highest accuracy and lowest calibration error, with uncertainty estimates that are highest in regions of covariate space unobserved during training.
Real-World Classification (CIFAR-10-C, CelebA): VIDS consistently outperforms baselines in accuracy under various corruption and attribute shift scenarios.
Real-World Regression (UCI Datasets): VIDS yields the lowest or comparable RMSE across all datasets, with more reliable uncertainty quantification under cluster-induced covariate shifts.

The results demonstrate that VIDS provides both improved predictive performance and more calibrated uncertainty estimates under distribution shift, compared to existing methods.

Theoretical and Practical Implications

Theoretically, the adaptive prior framework generalizes the classical Bayesian approach by explicitly modeling the dependence of parameter uncertainty on the test covariate. This is a principled solution to the problem of underestimating uncertainty for OOD inputs, which is a well-documented failure mode of standard Bayesian neural networks. The use of synthetic environments, justified via a bootstrap-based argument, provides a practical mechanism for simulating a wide range of potential distribution shifts without requiring access to future test data.

From a practical perspective, the VIDS framework is readily implementable with modern deep learning libraries. The amortized variational inference approach ensures scalability to large datasets and high-dimensional models. The method is compatible with any neural network architecture, as the adaptive prior and inference network operate on learned representations. The computational overhead is primarily due to the need to optimize over multiple synthetic environments, but this is mitigated by parallelization and efficient batching.

Implementation Considerations

Computational Requirements: Training VIDS involves repeated forward and backward passes through the inference network for each synthetic environment. However, the use of amortized inference and batching makes the approach tractable for moderate-scale problems.
Hyperparameter Selection: The number and size of synthetic environments, as well as the penalty on environment variance, are important hyperparameters. The paper reports that these can be selected via grid search on held-out data.
Integration with Existing Pipelines: VIDS can be integrated into existing uncertainty-aware pipelines by replacing the prior and posterior inference components. The method is agnostic to the choice of base model and can be applied to both regression and classification tasks.

Limitations and Future Directions

While VIDS demonstrates strong empirical performance, its reliance on synthetic environments constructed from the training data may limit its ability to anticipate shifts that are entirely disjoint from the training support. The theoretical guarantees require that the support of the test distribution is contained within that of the training distribution. In practice, this assumption may be violated, especially in high-dimensional settings.

Future research directions include:

Extending the framework to handle support mismatch, possibly via generative modeling of covariate space.
Investigating alternative forms of the adaptive prior, including nonparametric or hierarchical formulations.
Applying the method to sequential or streaming data, where distribution shifts may evolve over time.
Exploring the integration of causal inference techniques to further disentangle the effects of covariate shift on predictive uncertainty.

Conclusion

This work provides a principled and practical approach to uncertainty quantification under covariate shift, addressing a key limitation of classical Bayesian neural networks. By conditioning the prior on both training and test covariates and leveraging synthetic environments, VIDS delivers improved calibration and robustness in the presence of distribution shifts. The framework is broadly applicable and offers a foundation for further advances in reliable uncertainty estimation for deep learning systems.

Markdown