Distribution-Based Sensitivity Analysis
- Distribution-Based Sensitivity Analysis (DBSA) is a framework that evaluates the global impact of distributional perturbations on model outputs by integrating algebraic, information-theoretic, and robust optimization methods.
- It employs closed-form sensitivity functions, optimal covariation schemes, and divergence metrics (e.g., f-divergence, CvM) to quantify and characterize changes beyond mean-based approaches.
- DBSA finds applications in discrete Bayesian models, black-box simulations, and causal inference, offering actionable insights for robust decision-making under uncertainty.
Distribution-Based Sensitivity Analysis (DBSA) quantitatively characterizes how changes or uncertainties in probability distributions—whether model parameters, structural assumptions, or input distributions—affect probabilistic outputs, counterfactuals, or key performance indices. DBSA provides a rigorous framework for assessing not merely mean-based or variance-based sensitivity, but the global, distributional impact of perturbations, covariations, or adversarial misspecification, and unifies a diverse suite of tools from parametric models, information theory, robust optimization, and statistical estimation.
1. Algebraic and Polynomial Frameworks in Discrete Models
A foundational setting for DBSA is finite discrete models, such as Bayesian networks and their generalizations. Here, the joint law of a random vector with parameter vector admits a polynomial representation: for each atomic outcome , the probability is a sum of monomials in , consolidated in an "interpolating polynomial" (Leonelli et al., 2015). When the model is multilinear (all exponents in each monomial are $0$ or $1$), as in ordinary BNs, CSI-trees, or chain event graphs, DBSA exhibits several key properties:
- Closed-form Sensitivity Functions: The probability of any event of interest under a perturbed parameter vector can be written as a multilinear polynomial in the perturbed CPT entries,
where depend only on original parameters and the covariation scheme.
- Chan–Darwiche Distance Factorization: Distributional change is quantified by the Chan–Darwiche (CD) distance,
where , are, respectively, the maximum and minimum ratios of perturbed to original atomic probabilities. If no two CPT parameters appear in the same monomial, the distance simplifies to a maximum (resp. minimum) over the varied/covaried CPT entries.
- Optimal Covariation Schemes: Proportional covariation—setting the non-varied entries to
—minimizes among all valid covariations. Theorem 4.6 provides a general proof for this optimality across all multilinear models.
If the defining polynomial is not multilinear (e.g., dynamic BNs), sensitivity functions become higher-degree and closed-form optimal covariation need not exist.
2. Distributional Distance and Divergence-Based Indices
DBSA extends to continuous and high-dimensional settings by substituting variance-based indices with metrics quantifying changes between entire distributions.
- -Divergence Indices: For a convex generator , the -sensitivity index is
(Rahman, 2015). Classical Sobol' indices are special cases ().
- Cramér–von Mises Index (CvM): The CvM index measures the integrated squared difference between conditional and unconditional CDFs:
(Gamboa et al., 2015). The CvM index is sensitive to global distributional changes and generalizes Sobol' by accounting for the entire output law, not just moments.
- Discrepancy-based Indices: Discrepancy functions quantify non-uniformity in the empirical joint distribution of an input and output (e.g., star-discrepancy, symmetric or wrap-around discrepancy), often used as computationally efficient proxies for variance or information-based indices (2206.13470).
These measures are invariant to monotonic transformations, satisfy null independence postulates, and can be estimated via kernel density, polynomial surrogate, or finite-sample schemes.
3. DBSA in Stochastic Simulation, Composite Models, and Black-Box Systems
In settings where the underlying model or simulation is only sampled via a black-box (including LLMs, physical simulators, or MCMC-based estimation), DBSA quantifies how distributional outputs change as inputs (tokens, parameters, or stochastic seeds) are perturbed or replaced.
- Black-box LLM Sensitivity: Perturb each token in a prompt to its nearest neighbors; for each, estimate the change in model output distribution via an energy distance between the embedding clouds of baseline and perturbed model outputs. The resulting per-token sensitivity profile reveals which tokens drive model decisions, without requiring internal gradient access (Rauba et al., 12 Dec 2025).
- Differentiable Black-Box Sampling: Analytical formulae compute for a sample drawn from , by exploiting the local invertibility between and the vector of conditional CDFs . Both full-matrix and diagonal Newton schemes enable black-box, sample-wise differentiation, facilitating automated gradient computation for complex sample-based inference (Chuang et al., 12 Aug 2025).
- Bayesian Estimation: When only joint samples are available, Bayesian nonparametric or partition-based methods (e.g., Dirichlet process mixtures, Bayesian bootstrap) yield posterior distributions over probabilistic sensitivity indices (variance-based, density-based, CDF-based, etc.), enabling credible interval estimation even at moderate sample sizes (Antoniano-Villalobos et al., 2019).
4. Sensitivity Analysis under Input Distributional Uncertainty
DBSA is tightly connected to distributional robustness and parametric sensitivity of statistical models:
- Perturbative Analysis: For an input parameter space , parametrize output metrics , and seek the direction maximizing relative sensitivity,
constrained by a budget on , so that the principal eigenvectors of the moment matrix of score-based sensitivities capture simultaneous directions of maximal distributional perturbation (Yang, 2022).
- Information-Theoretic Limits: In the single-metric setting, the Fisher information matrix governs local sensitivity via
reflecting the Kullback–Leibler geometry of the underlying model.
- Robust Bounds: In treatment effect or identification analyses, partial identification bounds derived from -constrained likelihood ratios or -divergence balls quantify global distributional uncertainty. Estimators of the sharp lower/upper bounds and their inferential properties are available for a broad array of causal estimands, including average treatment effects, regression discontinuity, and instrumental variables, often expressible in closed-form (Dorn et al., 2023, Jin et al., 2022).
5. Specialized DBSA: Goal-Oriented and Dependent Input Structures
- Goal-Oriented Sensitivity Analysis (GOSA): Define sensitivity in terms of a user-specified contrast whose minimizer is the output feature of interest (mean, quantile, likelihood, etc.). The contrast-based index
extends DBSA to arbitrary functional goals, subsuming classical, quantile, probability, and likelihood contrasts (Fort et al., 2013).
- Dependent Input Structures: For models with dependent or copula-structured inputs, DBSA requires conditional sampling or representation of the output given any subset of inputs. Copula-based and empirical dependency models efficiently facilitate such conditional distributions. First-order and total-effect indices are built analogously, with consistent U-statistic estimators and central limit properties (Lamboni, 2021).
6. Applications in Engineering, Power Systems, and Causal Inference
- Power Systems: For power distribution with uncertain injections, probabilistic voltage sensitivity uses DBSA to forecast the full distribution of node voltages. Linearized voltage-sensitivity coefficients propagate input uncertainty through the network, with the resulting voltage magnitude following Rician-type distributions that allow explicit computation of violation probabilities, outperforming deterministic methods in both efficiency and coverage (Abujubbeh et al., 2020).
- Dynamic Structural Models: In dynamic structural economic models (e.g., demand estimation, labor supply), DBSA quantifies the impact of serial dependence misspecification through entropic optimal transport duals and KL-divergence constraints, yielding sharp and bootstrappable identified parameter bounds (Chen, 25 Oct 2025).
- Causal and Partial Identification: Under unmeasured confounding, -divergence-based sensitivity models and semiparametric efficient estimators provide inferentially valid, one-sided robust confidence intervals for counterfactual means or treatment effects (Jin et al., 2022, Zhang et al., 2019, Kline et al., 19 Apr 2025).
7. Practical Considerations, Limitations, and Extensions
- Optimality and Robustness: Multilinearity and proportional covariation simplify computation and guarantee optimality in discrete graphical models, but non-multilinear models require customized analysis (Leonelli et al., 2015).
- Computational Strategies: Methods based on discrepancy, kernel density estimation, or surrogate PDD models dominate in high-dimensional, surrogate, or black-box simulation regimes (2206.13470, Rahman, 2015).
- Interpretability: The choice of sensitivity metric (-divergence, CvM, discrepancy, etc.) should reflect the substantive aim—mean-based, tail, or global law sensitivity. Bayesian approaches enable uncertainty quantification without extra simulation cost (Antoniano-Villalobos et al., 2019).
- Extension to Deep Learning: DBSA methods compatible with automatic differentiation and gradient-based learning close the loop for differentiable probabilistic inference even in black-box, simulation-based science (Chuang et al., 12 Aug 2025).
- Limitations: Challenges remain for structural models with highly nonlinear or unidentified parameters, high-dimensional density estimation, and for settings where dependency structures or model features preclude direct analytic solution. Carefully chosen deterministic models or high-fidelity surrogates are necessary in such cases.
DBSA thus forms a unified, algebraically and statistically principled paradigm for quantifying, optimizing, and statistically inferring the distributional sensitivity of diverse models, linking algebraic, information-theoretic, and robust optimization perspectives across fields.