Papers
Topics
Authors
Recent
Search
2000 character limit reached

Functional ANOVA (fANOVA) Decomposition

Updated 2 May 2026
  • Functional ANOVA (fANOVA) is a framework that decomposes multivariate functions into interpretable sums of main effects and interactions under orthogonality constraints.
  • It partitions the variance of a function across individual features and their interactions, supporting robust sensitivity analysis and clearer model interpretability.
  • Approximation techniques such as paired sampling and fractional designs mitigate computational challenges in high-dimensional settings and link theoretical foundations to practical applications.

Functional ANOVA (fANOVA) Decomposition is a canonical framework for decomposing multivariate functions into interpretable sums of main effects and interactions, underpinned by orthogonality constraints. This decomposition is foundational in sensitivity analysis, machine learning model interpretability, surrogate modeling, and global attribution analysis. Its technical rigor and concrete algorithms underlie many recent methods for explainability, kernel methods, and modern neural architectures.

1. Mathematical Structure and Identifiability

Given a real-valued function f:D⊆Rp→Rf: D \subseteq \mathbb{R}^p \to \mathbb{R} and a probability measure p(x)p(x) with joint distribution for input features X=(X1,...,Xp)X = (X_1, ..., X_p), the functional ANOVA (fANOVA) expansion uniquely expresses ff as a sum of components indexed by all feature subsets U⊆{1,...,p}U \subseteq \{1, ..., p\}:

f(x)=∑U⊆[p]fU(xU)f(x) = \sum_{U \subseteq [p]} f_U(x_U)

where xUx_U denotes the subvector of xx corresponding to indices in UU. Uniqueness of the decomposition is guaranteed by the centering/orthogonality conditions:

  • E[fU(XU)]=0\mathbb{E}[f_U(X_U)] = 0 for every nonempty p(x)p(x)0,
  • p(x)p(x)1 for p(x)p(x)2, which follows automatically under independence or product measures,
  • Equivalently, p(x)p(x)3 for all proper p(x)p(x)4.

The fANOVA terms can be recursively constructed via conditional expectations:

p(x)p(x)5

or by Möbius inversion:

p(x)p(x)6

Hence, the decomposition is linear, hierarchical, and canonically defined (Herren et al., 2022).

2. Variance Partitioning and Interaction Quantification

The imposed orthogonality ensures a variance decomposition:

p(x)p(x)7

The quantity p(x)p(x)8 measures the contribution of the variables in p(x)p(x)9 (including all interactions among them) to the total output variance. Special cases:

  • Main-effect strength (X=(X1,...,Xp)X = (X_1, ..., X_p)0): X=(X1,...,Xp)X = (X_1, ..., X_p)1,
  • Second-order interaction (X=(X1,...,Xp)X = (X_1, ..., X_p)2): X=(X1,...,Xp)X = (X_1, ..., X_p)3,
  • Higher-order terms are analogous.

This variance partitioning is the backbone of global sensitivity analysis (e.g., Sobol’ indices), model interpretability, and identification of critical variable groupings (Herren et al., 2022).

3. Recursive Construction, Computational Challenges, and Approximations

A full fANOVA decomposition, even moderate X=(X1,...,Xp)X = (X_1, ..., X_p)4, requires evaluating X=(X1,...,Xp)X = (X_1, ..., X_p)5 conditional expectations. For models with large X=(X1,...,Xp)X = (X_1, ..., X_p)6, as often encountered in ML, exact computation is infeasible. Several pragmatic strategies are used:

  • Paired sampling/regression (as in SHAP): Sample coalitions X=(X1,...,Xp)X = (X_1, ..., X_p)7 in increasing order of size and perform weighted least-squares regression to recover approximations to fANOVA-derived attributions.
  • Fractional factorial and block designs: Omit high-order interactions (assumed negligible) by considering only subsets X=(X1,...,Xp)X = (X_1, ..., X_p)8 up to size X=(X1,...,Xp)X = (X_1, ..., X_p)9.
  • Breadth-first or importance-guided selection: Use screening metrics (e.g., L2 norm of excluded subsets) to select a small collection of important ff0,
  • Variance-based measures (Sobol’ indices, global sensitivity analysis): Guide sampling and approximation choices based on estimated effect sizes (Herren et al., 2022).

The interpretability of approximate decompositions is conditioned on the set of evaluated coalitions and the reference input distribution ff1.

The classical Shapley value from cooperative game theory is deeply connected to fANOVA. For feature ff2:

ff3

with ff4.

Expanding via the fANOVA yields: ff5

Thus, each ANOVA term is evenly split among its participating features in the Shapley value. This provides the mathematical justification for the use of fANOVA in SHAP and related explainability frameworks: SHAP values are weighted aggregations of fANOVA components. The framework accommodates arbitrary feature distributions, including settings where the empirical marginal, multivariate Gaussian, or local reference distribution is used (Herren et al., 2022).

5. Choice of Reference Feature Distribution and Its Impact

The expectations defining fANOVA components ff6 are inherently distribution-dependent. Practical implementations can choose:

  • (i) Marginal empirical distribution: Features treated as independent.
  • (ii) Multivariate Gaussian fit: Preserves second-order dependencies.
  • (iii) Local Gaussianization: Constructs a Gaussian around a specific input.
  • (iv) Point mass ("baseline"): Uses a fixed reference input.

Different choices yield distinct decompositions and, consequently, different SHAP or sensitivity allocations. These effects illuminate the central challenge: fANOVA-based attributions and interpretations are non-intrinsically tied to the distributional assumptions chosen, underscoring the arbitrariness of the "baseline" in high-dimensional contexts (Herren et al., 2022).

6. High-dimensional and Applied Contexts

Application domains place varying computational and modeling constraints on fANOVA:

  • Machine Learning Explainability: Models are tractable to evaluate, but input space is high-dimensional (hundreds/thousands of features). Full fANOVA/SHAP is computationally infeasible, driving reliance on approximation and aggressive sparsification.
  • Physical/Engineering Sensitivity Analysis: Models (e.g., PDE solvers) are expensive to run, but feature space is restricted (few covariates). fANOVA is used for in-depth global analysis at comparatively high per-sample cost.
  • Practical Interpretation: In high dimensions, feature attributions derived from fANOVA (e.g., SHAP values) are only meaningful relative to the particular set of coalitions and the chosen reference distribution. Complete fANOVA-based SHAP is theoretically elegant but limited by computational feasibility and the non-uniqueness of the decomposition under different baselines (Herren et al., 2022).

7. Significance, Limitations, and Implications

fANOVA is indispensable for rigorous variable importance assessment, structured model interpretation, and bridging statistical sensitivity analysis with modern machine learning explainability. Key implications:

  • Theoretical correctness (uniqueness, orthogonality) is preserved only under strict distributional and computational settings; practical usage requires careful approximation and explicit reporting of reference choices.
  • The deep connection to Shapley values puts cooperative game theoretic explanations on firm statistical ground but highlights sensitivity to decomposition choices.
  • Future research targets improved algorithms for scalable, distribution-robust fANOVA, and hybrid approaches that balance statistical rigor with computational viability (Herren et al., 2022).

No single fANOVA decomposition is universally privileged; interpretability is inexorably contingent on both computational and distributional choices, necessitating transparency and methodological caution in real-world practice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Functional ANOVA (fANOVA) Decomposition.