Bayesian Backfitting MCMC in BART

Updated 20 November 2025

Bayesian Backfitting MCMC is a probabilistic framework that uses Gibbs and RJMCMC updates to efficiently infer tree structures in Bayesian additive regression trees (BART).
The methodology eliminates the need for closed-form conjugate priors by employing reversible-jump moves with Laplace approximations, enabling inference under arbitrary likelihoods.
This approach broadens BART's applicability to models like structured heteroskedastic regression, survival analysis, and gamma-shape regression while ensuring robust uncertainty quantification.

Bayesian Backfitting Markov Chain Monte Carlo (MCMC) refers to an MCMC-based inference framework for Bayesian additive regression trees (BART). In its original form, the algorithm exploits conditional conjugacy to enable efficient Gibbs-style updates for tree structures and leaf parameters, but this restricts applicability to models where closed-form posteriors exist. Recent advances, most notably the reversible-jump (RJ) extension, remove this restriction—allowing Bayesian backfitting to be deployed for arbitrary likelihoods using RJMCMC updates, provided only that the likelihood and (optionally) its gradient and curvature can be evaluated. This greatly expands the domain of BART, enabling its use in structured heteroskedastic regression, survival analysis, and many other modeling contexts (Linero, 2022).

1. Semiparametric Regression Setup for Bayesian Backfitting

BART models the conditional mean function in regression as a sum of tree-structured learners. For observed data $(X_i, Y_i)$ , $i=1,\dots,N$ , the model assumes: $Y_i \sim \mathcal{N}(r(X_i),\sigma^2), \qquad r(x) = \sum_{t=1}^T g(x; T_t, M_t),$ where $g(x;T_t, M_t)$ denotes a regression tree, with $T_t$ the split structure and $M_t = \{\mu_{t\ell}\}$ the leaf means. Priors are typically set as: a branching-process prior for $T_t$ (with split probability at depth $d$ given by $\gamma(1+d)^{-\beta}$ ), independent Gaussian priors for leaf values, and improper (e.g., $\pi(\sigma)\propto 1/\sigma$ ) or proper inverse-gamma priors for variance (Linero, 2022).

The joint posterior is

$\pi\bigl(\{T_t, M_t\}_{t=1}^T, \sigma \mid X, Y\bigr) \propto \Bigl[\prod_{i=1}^N \mathcal{N}\bigl(Y_i \mid r(X_i), \sigma^2\bigr)\Bigr] \Bigl[\prod_{t=1}^T \pi_T(T_t)\,\pi_M(M_t)\Bigr]\,\pi(\sigma).$

The backfitting MCMC update is performed one tree at a time, conditioning on the others through residuals: $R_i^{(-t)} = Y_i - \sum_{k\neq t} g(X_i;T_k, M_k), \qquad R_i^{(-t)} \sim \mathcal{N}(g(X_i; T_t, M_t),\,\sigma^2)$ For each $t$ , the tree structure is updated and its leaf means are drawn from closed-form Gaussian full-conditionals, exploiting Normal–Normal conjugacy.

2. Limitations of Conditional Conjugacy

The standard backfitting algorithm relies on the ability to compute, for each tree and leaf node, an integrated likelihood of the form

$\int \pi_M(\mu)\,\prod_i f(Y_i \mid \lambda_i + \mu)\,d\mu,$

in closed form. This is possible for a small set of likelihood–prior pairs (e.g., Gaussian-Gaussian, Poisson-loggamma, some standard survival models) but is intractable for most settings—including generalized linear models with nonconjugate random effects, beta-binomial models, and others (Linero, 2022). When conditional conjugacy fails, bespoke solutions or quadrature are required, or BART cannot practically be used with such likelihoods.

3. RJMCMC Backfitting: Removing Conjugacy Restrictions

Linero (2021) introduced an RJMCMC-based extension that allows arbitrary likelihood models in the Bayesian backfitting context (Linero, 2022). In this framework, the joint update of both the tree structure $T_t$ and the leaf values $M_t$ is performed using reversible-jump MCMC, enabling Bayesian backfitting without the need for closed-form marginalization. The requirements are:

Evaluability of $\ell(T_t, M_t)$ , where

$\ell(T_t, M_t) = \prod_{\ell \in \mathrm{leaves}(T_t)} \prod_{i: x_i \rightarrow \ell} f_\eta\bigl(Y_i \mid \lambda_i + \mu_{t\ell}\bigr)$

Ability to compute (ideally) the score $U_\eta(y \mid \lambda)=\partial_\lambda \log f_\eta(y\mid\lambda)$ and Fisher information $\mathcal I_\eta=-\partial^2_\lambda\log f_\eta(y\mid\lambda)$ ; finite-difference approximations suffice otherwise.

The RJMCMC moves are:

Birth: Propose splitting a leaf, sampling a new split and independent values for child leaves from a Laplace-approximated local Gaussian.
Death: Propose collapsing two sibling leaves, reusing or integrating their parameters.
Change: Propose resampling the split rule for an internal node and redrawing leaf values.

Acceptance ratios follow the Green (1995) RJMCMC framework. Proposition 1 in (Linero, 2022) gives explicit formulas, where (for a birth move)

$R_{\mathrm{birth}} = \frac{\rho_d\,(1-\rho_{d+1})^2}{1-\rho_d} \times \frac{f(\ell L\mid\lambda,\mu_{\ell L}') f(\ell R\mid\lambda,\mu_{\ell R}')}{f(\ell \mid\lambda,\mu_\ell)} \times \frac{p_{\mathrm{death}}/(|\mathrm{NoGrandparentNodes}(T_t')|)}{p_{\mathrm{birth}}/(|\mathrm{Leaves}(T_t)|)} \times \frac{G_{\mathrm{death}}(\mu_\ell)}{G_{\mathrm{birth}}(\mu_{\ell L}',\mu_{\ell R}')}$

with analogous ratios for death and change.

Leaf proposals $G(\cdot)$ are constructed via Laplace approximations. At a new leaf $\ell$ , $m_\ell$ and $v_\ell^2$ approximate the mode and curvature of the local posterior: $m_\ell = \underset{\mu}{\arg\max}\;\sum_{i:x_i\to\ell} \log f_\eta(Y_i|\lambda_i+\mu) + \log\pi_M(\mu)$

$v_\ell^{-2} = -\left.\frac{d^2}{d\mu^2}\left[\sum_i \log f_\eta(Y_i|\lambda_i+\mu)+\log\pi_M(\mu)\right]\right|_{\mu=m_\ell}$

so $G(\mu)\approx\mathcal{N}(m_\ell,v_\ell^2)$ .

4. Algorithmic Workflow and Pseudocode

The RJMCMC-based Bayesian backfitting algorithm, per (Linero, 2022), for each tree $t$ and at each iteration, proceeds as:

Compute partial residuals $\lambda_i=\sum_{k\neq t}g(X_i;T_k, M_k)$ .
With specified probabilities $(p_{\mathrm{birth}}, p_{\mathrm{death}}, p_{\mathrm{change}})$ , select move type.
Propose $(T_t', M_t')$ via the chosen move, with new leaf means sampled from their Laplace-approximated $G$ .
Compute the acceptance ratio $R$ (Proposition 1).
Accept $(T_t',M_t')$ with probability $\min(1, R)$ , otherwise retain current $(T_t, M_t)$ .

After all trees are updated, global (nuisance) parameters (e.g., variance, shape) are sampled via their respective full-conditionals, such as slice sampling for $\sigma$ .

Key complexity and tuning characteristics:

No tuning is required beyond standard BART hyperparameters $(T, \gamma, \beta, s)$ .
The computational cost per iteration is $O(\sum_{t=1}^T n_t) \approx O(NT)$ .
Empirically, RJMCMC-backfitting exhibits mixing comparable to the conjugate scenario (Linero, 2022).

5. Detailed Balance, Dimension Matching, and Theoretical Properties

Each RJMCMC proposal alters the dimension of $M_t$ by adding or removing one leaf value; detailed balance is preserved via Green’s (1995) dimension-matching construction. For the birth move, the required bijective mapping in parameter space is implemented by discarding a single $\mu_\ell$ and replacing it with independent $\mu_{\ell L}', \mu_{\ell R}'$ ; the Jacobian is 1. The acceptance probability

$R = \frac{\pi(\theta') q(\theta|\theta')}{\pi(\theta) q(\theta'|\theta)} |\det J|$

specializes precisely to the ratios of Proposition 1 (Linero, 2022). The use of Laplace-based $G$ ensures automatic adaptation to the target density’s local geometry, obviating the need for user-tuned proposal scales.

6. Representative Applications and Implementation Examples

The RJMCMC paradigm extends BART to a broad spectrum of models:

Structured heteroskedastic regression: Models of the form $Y_i\sim\mathcal{N}(m_i,\phi V(m_i))$ with $m_i=e^{r(X_i)}$ and $V(m)=m$ , applied to Poisson data.
Accelerated-failure-time survival models: Generalized gamma and log-logistic forms, supporting covariate-dependent shape parameters.
Gamma-shape regression: $Y_i\sim\mathrm{Gamma}(\alpha_i, \beta)$ , with $\alpha_i = e^{r(X_i)}$ .

In all scenarios, the user need only provide $\log f_\eta$ , its gradient, and optionally the Fisher information. Implementation reuses the same RJMCMC-backfitting routine, with posterior sampling over $r(x)$ and global/nuisance parameters proceeding as with standard BART (Linero, 2022).

7. Significance and Broader Impact

By removing the conditional conjugacy restriction, RJMCMC-based Bayesian backfitting enables BART to be used in an extensive range of applications—beyond what classic data augmentation or tailored MCMC methods can support. This approach preserves the interpretability and flexible uncertainty quantification of tree ensembles, while ensuring practical applicability to contemporary statistical models in regression, survival analysis, and structured heteroskedasticity (Linero, 2022). A plausible implication is that further generalizations to other nonparametric ensemble priors may be straightforward, given only evaluability of the requisite likelihood and its derivatives.

PDF Markdown Chat (Pro)

References (1)

Generalized Bayesian Additive Regression Trees Models: Beyond Conditional Conjugacy (2022)

Follow Topic

Get notified by email when new papers are published related to Bayesian Backfitting Markov Chain Monte Carlo (MCMC).