Bayesian Sensitivity Analysis
- Bayesian sensitivity analysis is a rigorous framework that quantifies how prior choices affect posterior inferences across varying hyperparameters.
- It leverages importance sampling and control variates to efficiently recycle MCMC samples, reducing variance and computational cost.
- The method has practical applications in Bayesian variable selection and model averaging, enabling robust empirical Bayes analysis and sensitivity visualization.
Bayesian sensitivity analysis encompasses a rigorous set of techniques for quantifying how posterior inferences, model-based probabilities, or related functionals depend on modeling choices—often focusing on prior specifications—in a Bayesian analysis. Sensitivity analysis is particularly critical in empirical Bayes procedures, hierarchical modeling, and model selection, where prior hyperparameters or prior families are not uniquely determined, and where computational cost or uncertainty quantification preclude exhaustive “rerun and compare” approaches. Efficient computational strategies for Bayesian sensitivity analysis, such as those relying on importance sampling and control variates, enable simultaneous evaluation of posterior summaries over large prior families, facilitating robust inference and principled hyperparameter selection.
1. Sensitivity Analysis with Prior Families: Problem Formulation
In modern Bayesian modeling, the prior law on parameters, denoted , is frequently indexed by a multidimensional hyperparameter , where may be continuous or high-dimensional. Posterior expectations of a generic functional become functions of :
where is the posterior under prior and observed data . The central objective is to compute efficiently for all and to identify subsets of that yield reasonable or robust inference. This requires a mechanism to estimate posterior expectations and Bayes factors across many hyperparameter choices without constructing separate MCMC chains for each —a process that would otherwise be computationally infeasible for moderate to large .
2. Importance Sampling and Variance Reduction via Control Variates
The foundational methodological insight is that, if one has Markov chain samples from posterior(s) associated with so-called “skeleton” hyperparameter values , one may “recycle” samples to estimate posterior expectations for any using importance sampling. For a base chain at , the ratio representation for expectations is:
and empirical estimators derived by averaging over the chain. For increased robustness and efficiency, the authors introduce importance sampling from mixture posteriors, specifically:
with mixing proportions reflecting sample sizes for the skeleton chains. Posterior expectations and Bayes factors are then estimated as ratios of integrals with “mixture importance densities,” requiring computation of weights involving ratios of prior densities and marginal likelihood terms (normalizing constants).
Variance of naive importance sampling estimators can be excessive for far from the skeleton points, especially in high dimensions. To reduce estimator variance, control variates are constructed, exploiting the property that their expectation under the mixture distribution is zero:
Weighted combinations of these control variates, parameterized by coefficients (chosen to minimize variance, often via least squares regression), yield unbiased estimators with reduced variance. These estimators are provably asymptotically normal with variance contributions from both the Monte Carlo sampling and from estimation of the normalizing constants .
3. Implementation in Bayesian Variable Selection and Model Averaging
A concrete application is provided for Bayesian linear regression variable selection with a two-dimensional hyperparameter , where is the a priori inclusion probability for regressors and indexes the scale of Zellner’s -prior on regression coefficients. The key computational step is the efficient calculation of the Radon–Nikodym derivative between priors indexed by different , which, after algebraic simplification, involves only ratios and exponents pertaining to , and a Gaussian density ratio for the coefficients:
This quantity needs no matrix inversion, allowing for highly efficient updating when accessing large grids of . The methodology is implemented in the R package “bvslr”.
4. Case Study: US Crime Data and Sensitivity Visualization
The methodology is illustrated using the US crime dataset of Vandaele, which consists of a response (crime rate), 15 predictors, and various log-transformations consistent with the Bayesian variable selection literature. Posterior Bayes factors and posterior inclusion probabilities are computed across a dense grid of hyperparameter values, by running MCMC chains at a set of skeleton points (e.g., a grid in space), and then propagating results via importance sampling and control variates. Results show that in this application, posterior model weights and variable inclusion probabilities exhibit considerable sensitivity to prior choices, with optimal hyperparameters identified via empirical Bayes selection (e.g., maximizing the posterior predictive performance over ). The analysis also enables visualization of how inferences shift with , providing quantification and diagnostic insight into sensitivity.
5. Generality, Computational Considerations, and Limitations
Although the primary focus is variable selection in Gaussian linear models, the methodology generalizes to any Bayesian setting where a parametric or nonparametric prior is indexed by a hyperparameter and standard MCMC is feasible at a (relatively small) number of skeleton points. Two-stage estimation—first for normalization constants, then for multipoint expectations via importance sampling/control variates—applies to hierarchical, empirical Bayes, and nonparametric models such as Dirichlet process mixtures. Asymptotic theory requires geometrically ergodic chains and suitable moment conditions (e.g., existence of moments of key estimators).
Key points on computational limitations:
- The dimensionality of should be small to moderate; otherwise, coverage of the hyperparameter space by skeleton points becomes infeasible, and variance increases.
- Variance estimation and skeleton point selection are interlinked; optimal skeleton placement and computational effort trade off against estimator variance.
- This approach does not scale directly to very high-dimensional hyperparameter spaces or to cases where MCMC cannot be efficiently conducted at skeleton points.
6. Summary and Implications for Bayesian Modeling Workflows
The computational approach described enables robust and efficient multifaceted Bayesian sensitivity analyses. By formulating expectations and Bayes factors as ratios of integrals re-expressed in terms of different prior choices, and leveraging importance sampling together with control variate variance reduction, practitioners can efficiently explore the effect of prior hyperparameter uncertainty, perform empirical Bayes selection, and visualize sensitivity surfaces. The methods are broadly applicable in model averaging and empirical Bayes frameworks, and, when used judiciously, yield both practical and theoretical benefits in robustness assessment, model selection, and hyperparameter inference within Bayesian settings. This approach bridges the gap between practical computational feasibility and thorough quantitative evaluation of prior-driven sensitivity in posterior inference.