ANOVA-BART: Structured Bayesian Additive Trees
- The paper introduces ANOVA-BART, a method that decomposes regression functions into main effects and interactions for clearer interpretation.
- It employs a hierarchical Bayesian framework with regularized tree ensembles to capture nonlinearities and quantify uncertainty.
- Empirical and theoretical results demonstrate minimax optimal convergence rates and enhanced predictive performance over standard BART.
ANOVA Bayesian Additive Regression Trees (ANOVA-BART) is a nonparametric Bayesian modeling approach that integrates the ensemble learning principles of Bayesian Additive Regression Trees (BART) with the functional Analysis of Variance (ANOVA) decomposition. ANOVA-BART is specifically designed to enhance interpretability while maintaining the flexibility, predictive performance, and uncertainty quantification of BART. It achieves this by decomposing a regression function into additive main effects and interaction components, each represented and inferred using separate ensembles of regression trees. This structure enables direct estimation of the contributions from factors and their interactions, offering deeper insights into the underlying data-generating process and allowing for principled component selection.
1. Functional ANOVA Decomposition in BART
ANOVA-BART extends the standard BART modeling paradigm by explicitly encoding a functional ANOVA decomposition within the sum-of-trees framework. In the classical setting, BART models the unknown regression function as
where each is a weak Bayesian regression tree indexed by its structure and set of leaf parameters . BART excels at capturing nonlinearities and high-order interactions without explicit feature engineering but produces models where the marginal contribution of main effects and specific interactions is not explicitly separated.
ANOVA-BART imposes a functional ANOVA structure on : where denotes a main effect, a second-order interaction, and so on. Each component is then modeled as its own sum-of-trees. For example, for two predictors, this leads to the concrete representation: with
with all tree ensembles regularized via priors to remain weak learners. Each functional component (main or interaction) is thus separately identified and estimated.
2. Statistical Model and Regularization
The ANOVA-BART model operates in a Bayesian framework, specifying a likelihood and hierarchical priors for all components. For observed data , the response is modeled as
with . The hierarchical priors enforce the weak learner property for each tree across all components:
- The depth-dependent prior for node splitting enforces shallow, regularized trees:
- Leaf (terminal node) values for each tree in each effect component group are given Gaussian priors, typically centered at 0 and variance scaled to promote shrinkage:
where is the number of trees for effect group , and a tuning parameter.
- For higher-order interactions, stronger shrinkage can be imposed by adjusting or scaling variances, paralleling hierarchical regularization in classical ANOVA.
The total model becomes: where indexes all main and interaction groups being modeled.
3. Posterior Inference, MCMC, and Component Selection
Model fitting is performed via a blocked Gibbs or Metropolis-within-Gibbs MCMC, iteratively cycling through each tree in each component group:
- For each group (main or interaction), calculate partial residuals after subtracting the current sum across all other groups.
- For each tree in , update the structure () with a Metropolis-Hastings proposal (grow, prune, change, swap).
- Leaf node parameters are updated from conjugate normal posteriors given the new structure and partial residuals.
- Update error variance from its conjugate inverse-chi-squared posterior.
- Posterior samples for each yield direct estimates (mean, credible intervals) for all components.
Component selection is performed by examining posterior inclusion probabilities or effect sizes for each group. Shrinkage settings for higher-order groups can be tuned to automatically suppress unneeded interactions, enhancing model interpretability.
4. Theoretical Guarantees and Posterior Convergence
ANOVA-BART preserves the minimax optimality properties of standard BART for function estimation and extends these to interaction components. Specifically, the posterior concentration rate of the model is nearly minimax optimal, with the important property that each additive or interaction effect enjoys the same convergence rate as the overall function, a property not guaranteed by standard BART. This means that, for smooth -dimensional functions of H\"older smoothness , the posterior contracts at the rate , up to logarithmic factors, for each .
This theoretical property ensures not only accurate recovery of the overall regression surface, but also reliable estimation and uncertainty quantification for each main effect and interaction. Thus, ANOVA-BART delivers finer-grained, structure-aware inference not available in aggregate BART posteriors.
5. Empirical Performance, Interpretability, and Comparison to BART
Experimental evidence confirms that ANOVA-BART achieves superior predictive accuracy and improved uncertainty quantification compared to standard BART, particularly in settings where the regression surface exhibits interpretable main effects or interactions. This performance improvement is attributable to:
- Enhanced identifiability arising from explicit decomposition;
- Direct posterior inference and shrinkage for each component, which prevents "masking" of effects by complex higher-order terms;
- Improved variable and component selection, allowing irrelevant effects to be pruned without sacrificing predictive ability.
Further, ANOVA-BART achieves these advances while maintaining computational scalability (since the overall model is modularized) and theoretical support for each constituent. The practical advantage is a model that is at once flexible, statistically efficient, and interpretable, facilitating discoveries such as which combinations of predictors drive response variability.
6. Extensions, Applications, and Future Directions
ANOVA-BART provides a structurally transparent scaffold for further model enrichment:
- Multilevel/Hierarchical Effects: Random or group effects can be integrated as additional additive or interaction components, either modeled parametrically or with their own tree ensembles.
- Heteroscedasticity and Non-Gaussian Outcomes: By altering the likelihood or adding BART-modeled scale functions, ANOVA-BART can be employed for heteroscedastic or exponential family response types, leveraging theory from generalized BART and heteroscedastic BART.
- Functional Data and High Dimensions: Functional extensions leverage the ability to decompose and regularize high-dimensional interactions, with relevant applications in genomics, spatial-temporal modeling, and causality where explicit effect decomposition is essential.
As a research area, ongoing directions include: data-driven selection of ANOVA component structure; hybridization with co-data or external prior information for effect groups; robust error modeling; and integration with density regression or causal inference frameworks.
In summary, ANOVA-BART offers a methodologically principled, empirically validated, and theoretically guaranteed framework for combining the adaptivity of BART with the interpretability and structured effect separation of ANOVA. This synthesis makes it a compelling choice for modeling and understanding complex interactions in modern regression and classification problems (Park et al., 3 Sep 2025, 0806.3286).