Functional ANOVA Overview

Updated 7 February 2026

Functional ANOVA is a framework that decomposes multivariate functions into uniquely defined, orthogonal main effects and interactions for clear interpretability.
It leverages statistical properties to achieve precise variance decomposition and compute Sobol sensitivity indices that illuminate feature importance.
Modern estimation methods, including spline penalties and boosting techniques, ensure scalable and robust computation in high-dimensional settings.

Functional ANOVA (fANOVA) is a canonical framework for decomposing multivariate functions into uniquely defined, orthogonal components representing main effects and interactions of input variables. This decomposition underpins modern interpretability, sensitivity analysis, design of experiments, and nonparametric regression, and provides a unified representation for feature attribution in machine learning and global sensitivity analysis. The following sections detail the formalism, identifiability conditions, estimation methods, connections to explainability, theoretical advances, and recent algorithmic developments.

1. Canonical fANOVA Decomposition and Identifiability

Let $f:\mathbb{R}^p \to \mathbb{R}$ be square-integrable with respect to a probability distribution $p(X)$ on $\mathbb{R}^p$ . The functional ANOVA decomposition expresses $f$ as a sum over all subsets $S \subseteq \{1,\ldots,p\}$ : $f(x) = \sum_{S \subseteq \{1,\ldots,p\}} f_S(x_S)$ where $x_S = (x_j : j \in S)$ , $f_\emptyset = \mathbb{E}[f(X)]$ , and for $S \neq \emptyset$ ,

$f_S(x_S) = \mathbb{E}[f(X)\mid X_S = x_S] - \sum_{T \subset S} f_T(x_T)$

This can also be written in inclusion–exclusion form: $f_S(x_S) = \sum_{T\subseteq S} (-1)^{|S|-|T|} \mathbb{E}[f(X) \mid X_T = x_T]$ The components $f_S$ are mutually orthogonal and satisfy: $\forall j \in S,\quad \int f_S(x_S)\, p(x_j) dx_j = 0$ ensuring identifiability and preventing interaction terms from "leaking" into lower-order effects (Herren et al., 2022).

The decomposition is unique as long as the distribution $p(X)$ is fixed and nondegenerate. However, if multiple plausible distributions exist, the decomposition is unique only within a "core" of distributions yielding the same conditional expectations; otherwise, multiple non-equivalent expansions may result (Borgonovo et al., 2018).

2. Statistical Properties and Variance Decomposition

The orthogonality of the $f_S$ under $p(X)$ provides a variance decomposition: $\operatorname{Var}[f(X)] = \sum_{S \neq \emptyset} \operatorname{Var}[f_S(X_S)]$ This directly gives the Sobol' sensitivity indices:

First-order: $S_i = \operatorname{Var}[f_{\{i\}}(X_i)] / \operatorname{Var}[f(X)]$
Total-effect: $S_{T_i} = \sum_{i \in S} \operatorname{Var}[f_S(X_S)] / \operatorname{Var}[f(X)]$
Higher-order: similarly defined for |S| > 1

The total variance explained by all effects of order $|S| = k$ is $\sum_{|S|=k} \operatorname{Var}[f_S(X_S)]$ . The effective dimension, quantifying the dominant order of interactions, is defined as $D_S = \mathbb{E}[|T|]$ , with $T$ distributed as $P(T = S) \propto \operatorname{Var}[f_S(X_S)]$ (Borgonovo et al., 2018).

3. Computation and Estimation in Machine Learning

SHAP and Feature Attribution

SHAP values for black-box model interpretability are linear combinations of the ANOVA components. For feature $i$ , the Shapley value is: $\varphi_i = \sum_{j=1}^p \frac{1}{j} \sum_{S \subseteq \{1,\ldots,p\} : i \in S, |S|=j} f_S(x_S)$ Estimating Shapley values exactly requires computing conditional expectations for all $2^p$ coalitions, but practical algorithms sample coalitions, use linear regression with Shapley-driven weights, and exploit problem structure to make the computation tractable (Herren et al., 2022, Fumagalli et al., 2024).

Low-order structures (main effects and two-way interactions) permit aggressive dimensionality reduction: if higher-order terms are negligible, $2p$ function evaluations suffice for full recovery. SHAP and related methods can restrict attention to sparsity patterns known a priori or determined by screening (e.g., variance-based, Hooker's L2 cost-of-exclusion) (Herren et al., 2022).

Modern Estimation Algorithms

Spline and Penalized Methods: Hierarchical total-variation penalties and group-Lasso in spline bases select both sparse component sets and knot locations, enabling efficient and interpretable estimation (Yang et al., 2019).
Boosted Trees and Neural Networks: GAMI-Tree (Hu et al., 2022), GAMI-Lin-T (Hu et al., 2023), ANOVA-TPNN (Park et al., 21 Feb 2025), and ANOVA-BART (Park et al., 3 Sep 2025) all explicitly model and estimate fANOVA decompositions, enforcing orthogonality either directly or via post-hoc "purification" algorithms (Lengerich et al., 2019).
Large-scale Emulation: Multi-resolution functional ANOVA uses hierarchically nested basis expansions estimated by group-lasso to emulate functions of high-dimensional input efficiently (scaling to $10^6$ samples) (Sung et al., 2017).

In all cases, identifiability is enforced by ensuring that each component satisfies marginal zero-mean constraints under the relevant measure, either via explicit basis construction, projection, or post-processing ("mass-moving" purification algorithms (Lengerich et al., 2019)).

4. Inference and Theory: Confidence, Testing, Rates

The functional Bahadur representation and orthogonal tensor-product Sobolev structure provide the basis for precise effect-wise inference in smoothing spline ANOVA models (Cho et al., 2 Feb 2026). For each effect $S$ , minimizers of penalized least squares admit pointwise CLTs: $\frac{\sqrt{n}(\hat f_S(x_S) - f_S^*(x_S))}{\sigma \sqrt{\sigma_S^2(x_S)}} \to_{\mathcal{D}} \mathcal{N}(0,1)$ allowing construction of asymptotic $(1-\alpha)$ pointwise confidence intervals. For testing the global null $f_S^* \equiv 0$ , Wald-type statistics have power matching the minimax distinguishable boundary up to log-factors.

Optimal convergence rates for estimation are: $\|\hat f_S - f_S^*\|_{S,\lambda} = O_{\mathbb{P}}(n^{-m/(2m+1)}(-\log n)^{(|S|-1)/2})$ for interaction order $|S|$ , with main effects achieving the univariate-optimal rate (Cho et al., 2 Feb 2026).

When partial derivatives of $f$ are observed, convergence rates are improved as if the target model was of lower effective order: for a $d$ -way model with $p$ derivative types, the optimal rates are as for a $(d-p)$ -way model with no derivatives (Dai et al., 2017).

5. Robustness, Distributional Effects, and Sensitivity

Robust fANOVA models address sensitivity to outliers by employing heavy-tailed error models (e.g., t-process) or robust $M$ -estimators and permutation testing (RoFANOVA), yielding bounded influence and guarded type-I error in contaminated functional data settings (Zhang et al., 2018, Centofanti et al., 2021).

The choice of feature distribution $p(X)$ is fundamental: different marginal, conditional, or baseline imputations of missing features induce distinct decompositions with dramatic impacts on attributions and variance decompositions (Herren et al., 2022, Fumagalli et al., 2024, Borgonovo et al., 2018). When multiple plausible input distributions are available, fANOVA expansions may not be unique; practitioners can aggregate via mixture-of-measures, leading to non-orthogonal average "mixed" effects, or seek cores in which all measures agree on effect functions, quantifying sensitivity to distributional uncertainty (Borgonovo et al., 2018).

Kernel-based functional ANOVA in RKHS enables unified statistical tests of input-output dependence, embedding each component random variable in a Hilbert space and generalizing variance-based analysis to broader dependence structures, with classic Sobol' indices recovered as a special (quadratic-kernel) case when the effect distributions are Gaussian (Lamboni, 2023).

6. Extensions: Functional Data, Covariance, and Modern Applications

Functional ANOVA extends to group-comparison of curves or surfaces, where hypotheses concern differences in mean or covariance functions across groups:

Global significance tests based on the maximum pointwise F-statistic $F_{\max}$ (Zhang et al., 2013), permutation envelope methods (Mrkvicka et al., 2016), and robust $F$ -statistics (Centofanti et al., 2021) provide exact or high-power solutions under weak or contaminated functional noise assumptions.
In covariance comparison between populations of functions, fANOVA is operationalized geometrically via Procrustes–Wasserstein distances between Gaussian covariance operators; transport-based ANOVA tests offer superior power in detecting subtle, high-dimensional second-order differences and facilitate tangent-space PCA of the main axes of covariance variation (Masarotto et al., 2022).

Wavelet-domain fANOVA implements Bayesian spike-and-slab models over multiscale coefficients, where Markov grove graphical priors enable clustering and sharing of signal across time-frequency regions, yielding exact, linear-complexity posterior inference (Ma et al., 2016).

7. Practical Considerations and Modern Algorithmic Strategies

Effective practical fANOVA modeling hinges upon the following choices and strategies:

Interaction limitation: Restrict attention to main effects and low-order interactions, justified when higher-order components carry minimal variance—deduced by variance screening or domain knowledge.
Sparsity and regularization: Use sparsity-inducing penalties (group-Lasso, hierarchical TV) to select influential components without overfitting (Yang et al., 2019, Sung et al., 2017).
Distribution selection: Match the conditional expectation and orthogonality conditions to the data-generating mechanism, especially when features are dependent (Herren et al., 2022, Fumagalli et al., 2024).
Component identifiability and purification: Enforce zero-mean (orthogonality) constraints using explicit basis design, projection, or post-hoc purification ("mass-moving") algorithms to avoid interpretational contradictions (Lengerich et al., 2019, Park et al., 21 Feb 2025).
Computation: For high-dimensional settings, leverage boosting, block-coordinate descent, sampling/approximation of coalitions, or hierarchically organized multi-resolution representations (Hu et al., 2022, Sung et al., 2017).
Uncertainty quantification: Apply recent Bayesian and functional Bahadur-based effect-wise inference for honest confidence intervals and effect-wise hypothesis testing (Park et al., 1 Oct 2025, Cho et al., 2 Feb 2026).

In summary, functional ANOVA provides a rigorous and adaptable framework for decomposing complex functions into interpretable, orthogonally structured effects, bridging the demands of interpretability, inference, sensitivity analysis, and scalable nonparametric estimation in contemporary data analysis (Herren et al., 2022, Fumagalli et al., 2024, Cho et al., 2 Feb 2026, Borgonovo et al., 2018, Yang et al., 2019).

Markdown Upgrade to Chat

References (20)

Statistical Aspects of SHAP: Functional ANOVA for Model Interpretation (2022)

Functional ANOVA with Multiple Distributions: Implications for the Sensitivity Analysis of Computer Experiments (2018)

Unifying Feature-Based Explanations with Functional ANOVA and Cooperative Game Theory (2024)

Hierarchical Total Variations and Doubly Penalized ANOVA Modeling for Multivariate Nonparametric Regression (2019)

Using Model-Based Trees with Boosting to Fit Low-Order Functional ANOVA Models (2022)

Interpretable Machine Learning based on Functional ANOVA Framework: Algorithms and Comparisons (2023)

Tensor Product Neural Networks for Functional ANOVA Model (2025)

Bayesian Additive Regression Trees for functional ANOVA model (2025)

Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models (2019)

10.

Multi-Resolution Functional ANOVA for Large-Scale, Many-Input Computer Experiments (2017)

11.

Effect-Wise Inference for Smoothing Spline ANOVA on Tensor-Product Sobolev Space (2026)

12.

Minimax Optimal Rates of Estimation in Functional ANOVA Models with Derivatives (2017)

13.

Robust functional ANOVA model with t-process (2018)

14.

Robust Functional ANOVA with Application to Additive Manufacturing (2021)

15.

Kernel-based measures of association between inputs and outputs based on ANOVA (2023)

16.

A New Test for One-Way ANOVA with Functional Data and Application to Ischemic Heart Screening (2013)

17.

A one-way ANOVA test for functional data with graphical interpretation (2016)

18.

Transportation-Based Functional ANOVA and PCA for Covariance Operators (2022)

19.

Efficient functional ANOVA through wavelet-domain Markov groves (2016)

20.

Bayesian Neural Networks for Functional ANOVA model (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Functional ANOVA.