FANOVA-GP Prior: Sensitivity Analysis Framework
- The paper introduces the FANOVA-GP prior, an orthogonal GP framework that decomposes functions into additive effects for sensitivity analysis.
- The methodology constructs effect-specific kernels through conditional orthogonalization, enabling scalable computation and clear interpretability.
- It provides analytic Sobol’ and Shapley indices for feature attributions, demonstrating effectiveness in high-dimensional and functional output scenarios.
The FANOVA-GP prior (Functional ANOVA Gaussian Process prior) provides a nonparametric Bayesian framework for variance-based sensitivity analysis and effect decomposition in computer experiments, particularly targeting problems involving functional or high-dimensional outputs. By enforcing explicit orthogonality among all additive and interaction effects, it generalizes the classical ANOVA decomposition to arbitrary distributions, nonlinear dependencies, and both scalar and functional responses. Key constructions include conditional-orthogonality via kernel conditioning, closed-form effect-specific kernels, and efficient algorithms for both effect separation and attributions such as Sobol' and Shapley indices.
1. Mathematical Foundations of the FANOVA-GP Prior
The FANOVA-GP prior models the latent function , with input , as an explicit additive decomposition: where , indexes any subset of features, and denotes the subvector corresponding to . Each component is given a zero-mean Gaussian process prior independent of other components: ensuring orthogonality of effects under the feature-space measure: The aggregate prior thus corresponds to a GP with kernel
0
In the case of functional outputs, a joint variable 1 is considered, e.g.,
2
and the decomposition becomes
3
where each 4 is a zero-mean GP with covariance 5 and conditional independence among components (Tan et al., 15 Jun 2025, Mohammadi et al., 20 Aug 2025).
2. Kernel Construction and Conditional-Orthogonality
Orthogonality of components is enforced at the kernel level. For each feature 6, a base positive-definite kernel 7 (e.g., squared-exponential) is orthogonalized to yield a zero-mean kernel: 8 Each interaction kernel for subset 9 is then constructed as
0
The full kernel is the sum over all 1.
For functional-output decompositions, the output domain kernel 2 is introduced, and the subsetwise kernel is
3
Orthogonality is guaranteed by conditioning the GP prior for 4 on the constraint that its mean under each 5 marginal is zero for every 6 and for all 7: 8 This is achieved analytically through Gaussian conditioning, where the same kernel formula arises from the removal of the mean component. Consequently, all effect components are mutually orthogonal in 9 for the empirical feature distribution, and each effect has zero expectation in each coordinate (Tan et al., 15 Jun 2025, Mohammadi et al., 20 Aug 2025).
3. Analytical Indices for Sensitivity Analysis
Once the posterior GP is fitted, one obtains a posterior mean function decomposed by effect: 0 with each 1 computable via closed-form kernel evaluations and GP weights.
Variance-based sensitivity analysis is realized by computing, for each effect 2 and output location 3, the local variance: 4 and the total local variance 5, with the local Sobol' index: 6 Closed-form expressions for 7 are available: 8 where 9 and 0 denotes empirical expectations over the 1 marginal (Tan et al., 15 Jun 2025).
Averaging these indices in 2 yields global (expected conditional variance, ECV) variances and ECV indices: 3 enabling comprehensive attribution of both main effects and interactions.
4. Efficient Computation and Inference Procedures
The computational workflow closely follows that of standard GP regression, with added steps for empirical integration and kernel assembly:
- Empirical estimation of marginal feature densities 4 (or use of observed marginals) for construction of each 5.
- Assembly of the additive, orthogonal kernel matrix 6 via Hadamard products or, for the additive representation, via Newton's identities for elementary symmetric polynomials, leading to 7 complexity per kernel evaluation.
The covariance matrix in the functional-output case is
8
with 9, 0. The marginal likelihood is optimized w.r.t. hyperparameters (scales 1, noise 2, length scales) using gradient-based solvers, and 3 has a closed-form maximizer.
If outputs are observed on a regular grid in 4 and 5, the full kernel decomposes as a Kronecker product, yielding 6 cost for 7 grids (Tan et al., 15 Jun 2025).
The implementation for non-functional outputs benefits from explicit Möbius inversion and recursion over symmetric polynomials, avoiding enumeration of all 8 subsets, with all key operations scaling quadratically in 9 (Mohammadi et al., 20 Aug 2025).
5. Shapley Attributions and Explainability
The FANOVA-GP family admits exact, closed-form computation of Shapley values for both local (instance-wise) and global (variance-based) feature attributions at quadratic time complexity. The stochastic Shapley value is defined via the cooperative game over function components, capturing the expected contribution (as well as uncertainty) of each input to the functional or scalar output. Global Shapley values quantify feature importance for the model's overall sensitivity structure.
These attributions rest on a Möbius representation of the FANOVA decomposition and recursive algorithms leveraging Newton's identities for elementary symmetric polynomials, facilitating scalable and axiomatically sound explainability for structured probabilistic models (Mohammadi et al., 20 Aug 2025).
6. Nonparametric, Data-Driven, and Orthogonal Properties
No fixed basis functions are required, and the orthogonality constraint is imposed analytically in the kernel. This enables:
- Fully nonparametric modeling, with data-driven orthogonality valid for any observed 0 marginal distribution.
- No strong distributional assumptions, as all required kernel integrals are approximated empirically from the data.
- All effect orders (including high-order interactions) present in the prior, controlled by separate scale hyperparameters; uninformative high-order terms are naturally shrunk by their learned variances.
- Equivalence of computational costs to ordinary GP regression, with only minor preprocessing for empirical integral estimation.
This approach yields an explicit, orthogonal, and interpretable ANOVA decomposition for complex, nonlinear, and functional-output computer experiments, obviating the need for manual basis selection or uniform input assumptions, and with analytic variance-based indices available without resorting to Monte Carlo estimation (Tan et al., 15 Jun 2025).
7. Practical Relevance and Applications
The practical utility of FANOVA-GP priors, including the FOAGP variant for functional outputs, is demonstrated by their effective orthogonal effect decomposition and variance analysis in both simulated and real engineering applications, such as fuselage shape control. This framework provides both practitioners and theorists with an analytically tractable, scalable, and robust tool for nonparametric sensitivity analysis, interpretable uncertainty quantification, and input attribution in a wide range of complex, black-box modeling scenarios (Tan et al., 15 Jun 2025, Mohammadi et al., 20 Aug 2025).