Papers
Topics
Authors
Recent
2000 character limit reached

Bayesian Backfitting MCMC in BART

Updated 20 November 2025
  • Bayesian Backfitting MCMC is a probabilistic framework that uses Gibbs and RJMCMC updates to efficiently infer tree structures in Bayesian additive regression trees (BART).
  • The methodology eliminates the need for closed-form conjugate priors by employing reversible-jump moves with Laplace approximations, enabling inference under arbitrary likelihoods.
  • This approach broadens BART's applicability to models like structured heteroskedastic regression, survival analysis, and gamma-shape regression while ensuring robust uncertainty quantification.

Bayesian Backfitting Markov Chain Monte Carlo (MCMC) refers to an MCMC-based inference framework for Bayesian additive regression trees (BART). In its original form, the algorithm exploits conditional conjugacy to enable efficient Gibbs-style updates for tree structures and leaf parameters, but this restricts applicability to models where closed-form posteriors exist. Recent advances, most notably the reversible-jump (RJ) extension, remove this restriction—allowing Bayesian backfitting to be deployed for arbitrary likelihoods using RJMCMC updates, provided only that the likelihood and (optionally) its gradient and curvature can be evaluated. This greatly expands the domain of BART, enabling its use in structured heteroskedastic regression, survival analysis, and many other modeling contexts (Linero, 2022).

1. Semiparametric Regression Setup for Bayesian Backfitting

BART models the conditional mean function in regression as a sum of tree-structured learners. For observed data (Xi,Yi)(X_i, Y_i), i=1,,Ni=1,\dots,N, the model assumes: YiN(r(Xi),σ2),r(x)=t=1Tg(x;Tt,Mt),Y_i \sim \mathcal{N}(r(X_i),\sigma^2), \qquad r(x) = \sum_{t=1}^T g(x; T_t, M_t), where g(x;Tt,Mt)g(x;T_t, M_t) denotes a regression tree, with TtT_t the split structure and Mt={μt}M_t = \{\mu_{t\ell}\} the leaf means. Priors are typically set as: a branching-process prior for TtT_t (with split probability at depth dd given by γ(1+d)β\gamma(1+d)^{-\beta}), independent Gaussian priors for leaf values, and improper (e.g., π(σ)1/σ\pi(\sigma)\propto 1/\sigma) or proper inverse-gamma priors for variance (Linero, 2022).

The joint posterior is

π({Tt,Mt}t=1T,σX,Y)[i=1NN(Yir(Xi),σ2)][t=1TπT(Tt)πM(Mt)]π(σ).\pi\bigl(\{T_t, M_t\}_{t=1}^T, \sigma \mid X, Y\bigr) \propto \Bigl[\prod_{i=1}^N \mathcal{N}\bigl(Y_i \mid r(X_i), \sigma^2\bigr)\Bigr] \Bigl[\prod_{t=1}^T \pi_T(T_t)\,\pi_M(M_t)\Bigr]\,\pi(\sigma).

The backfitting MCMC update is performed one tree at a time, conditioning on the others through residuals: Ri(t)=Yiktg(Xi;Tk,Mk),Ri(t)N(g(Xi;Tt,Mt),σ2)R_i^{(-t)} = Y_i - \sum_{k\neq t} g(X_i;T_k, M_k), \qquad R_i^{(-t)} \sim \mathcal{N}(g(X_i; T_t, M_t),\,\sigma^2) For each tt, the tree structure is updated and its leaf means are drawn from closed-form Gaussian full-conditionals, exploiting Normal–Normal conjugacy.

2. Limitations of Conditional Conjugacy

The standard backfitting algorithm relies on the ability to compute, for each tree and leaf node, an integrated likelihood of the form

πM(μ)if(Yiλi+μ)dμ,\int \pi_M(\mu)\,\prod_i f(Y_i \mid \lambda_i + \mu)\,d\mu,

in closed form. This is possible for a small set of likelihood–prior pairs (e.g., Gaussian-Gaussian, Poisson-loggamma, some standard survival models) but is intractable for most settings—including generalized linear models with nonconjugate random effects, beta-binomial models, and others (Linero, 2022). When conditional conjugacy fails, bespoke solutions or quadrature are required, or BART cannot practically be used with such likelihoods.

3. RJMCMC Backfitting: Removing Conjugacy Restrictions

Linero (2021) introduced an RJMCMC-based extension that allows arbitrary likelihood models in the Bayesian backfitting context (Linero, 2022). In this framework, the joint update of both the tree structure TtT_t and the leaf values MtM_t is performed using reversible-jump MCMC, enabling Bayesian backfitting without the need for closed-form marginalization. The requirements are:

  • Evaluability of (Tt,Mt)\ell(T_t, M_t), where

(Tt,Mt)=leaves(Tt)i:xifη(Yiλi+μt)\ell(T_t, M_t) = \prod_{\ell \in \mathrm{leaves}(T_t)} \prod_{i: x_i \rightarrow \ell} f_\eta\bigl(Y_i \mid \lambda_i + \mu_{t\ell}\bigr)

  • Ability to compute (ideally) the score Uη(yλ)=λlogfη(yλ)U_\eta(y \mid \lambda)=\partial_\lambda \log f_\eta(y\mid\lambda) and Fisher information Iη=λ2logfη(yλ)\mathcal I_\eta=-\partial^2_\lambda\log f_\eta(y\mid\lambda); finite-difference approximations suffice otherwise.

The RJMCMC moves are:

  • Birth: Propose splitting a leaf, sampling a new split and independent values for child leaves from a Laplace-approximated local Gaussian.
  • Death: Propose collapsing two sibling leaves, reusing or integrating their parameters.
  • Change: Propose resampling the split rule for an internal node and redrawing leaf values.

Acceptance ratios follow the Green (1995) RJMCMC framework. Proposition 1 in (Linero, 2022) gives explicit formulas, where (for a birth move)

Rbirth=ρd(1ρd+1)21ρd×f(Lλ,μL)f(Rλ,μR)f(λ,μ)×pdeath/(NoGrandparentNodes(Tt))pbirth/(Leaves(Tt))×Gdeath(μ)Gbirth(μL,μR)R_{\mathrm{birth}} = \frac{\rho_d\,(1-\rho_{d+1})^2}{1-\rho_d} \times \frac{f(\ell L\mid\lambda,\mu_{\ell L}') f(\ell R\mid\lambda,\mu_{\ell R}')}{f(\ell \mid\lambda,\mu_\ell)} \times \frac{p_{\mathrm{death}}/(|\mathrm{NoGrandparentNodes}(T_t')|)}{p_{\mathrm{birth}}/(|\mathrm{Leaves}(T_t)|)} \times \frac{G_{\mathrm{death}}(\mu_\ell)}{G_{\mathrm{birth}}(\mu_{\ell L}',\mu_{\ell R}')}

with analogous ratios for death and change.

Leaf proposals G()G(\cdot) are constructed via Laplace approximations. At a new leaf \ell, mm_\ell and v2v_\ell^2 approximate the mode and curvature of the local posterior: m=argmaxμ  i:xilogfη(Yiλi+μ)+logπM(μ)m_\ell = \underset{\mu}{\arg\max}\;\sum_{i:x_i\to\ell} \log f_\eta(Y_i|\lambda_i+\mu) + \log\pi_M(\mu)

v2=d2dμ2[ilogfη(Yiλi+μ)+logπM(μ)]μ=mv_\ell^{-2} = -\left.\frac{d^2}{d\mu^2}\left[\sum_i \log f_\eta(Y_i|\lambda_i+\mu)+\log\pi_M(\mu)\right]\right|_{\mu=m_\ell}

so G(μ)N(m,v2)G(\mu)\approx\mathcal{N}(m_\ell,v_\ell^2).

4. Algorithmic Workflow and Pseudocode

The RJMCMC-based Bayesian backfitting algorithm, per (Linero, 2022), for each tree tt and at each iteration, proceeds as:

  1. Compute partial residuals λi=ktg(Xi;Tk,Mk)\lambda_i=\sum_{k\neq t}g(X_i;T_k, M_k).
  2. With specified probabilities (pbirth,pdeath,pchange)(p_{\mathrm{birth}}, p_{\mathrm{death}}, p_{\mathrm{change}}), select move type.
  3. Propose (Tt,Mt)(T_t', M_t') via the chosen move, with new leaf means sampled from their Laplace-approximated GG.
  4. Compute the acceptance ratio RR (Proposition 1).
  5. Accept (Tt,Mt)(T_t',M_t') with probability min(1,R)\min(1, R), otherwise retain current (Tt,Mt)(T_t, M_t).

After all trees are updated, global (nuisance) parameters (e.g., variance, shape) are sampled via their respective full-conditionals, such as slice sampling for σ\sigma.

Key complexity and tuning characteristics:

  • No tuning is required beyond standard BART hyperparameters (T,γ,β,s)(T, \gamma, \beta, s).
  • The computational cost per iteration is O(t=1Tnt)O(NT)O(\sum_{t=1}^T n_t) \approx O(NT).
  • Empirically, RJMCMC-backfitting exhibits mixing comparable to the conjugate scenario (Linero, 2022).

5. Detailed Balance, Dimension Matching, and Theoretical Properties

Each RJMCMC proposal alters the dimension of MtM_t by adding or removing one leaf value; detailed balance is preserved via Green’s (1995) dimension-matching construction. For the birth move, the required bijective mapping in parameter space is implemented by discarding a single μ\mu_\ell and replacing it with independent μL,μR\mu_{\ell L}', \mu_{\ell R}'; the Jacobian is 1. The acceptance probability

R=π(θ)q(θθ)π(θ)q(θθ)detJR = \frac{\pi(\theta') q(\theta|\theta')}{\pi(\theta) q(\theta'|\theta)} |\det J|

specializes precisely to the ratios of Proposition 1 (Linero, 2022). The use of Laplace-based GG ensures automatic adaptation to the target density’s local geometry, obviating the need for user-tuned proposal scales.

6. Representative Applications and Implementation Examples

The RJMCMC paradigm extends BART to a broad spectrum of models:

  • Structured heteroskedastic regression: Models of the form YiN(mi,ϕV(mi))Y_i\sim\mathcal{N}(m_i,\phi V(m_i)) with mi=er(Xi)m_i=e^{r(X_i)} and V(m)=mV(m)=m, applied to Poisson data.
  • Accelerated-failure-time survival models: Generalized gamma and log-logistic forms, supporting covariate-dependent shape parameters.
  • Gamma-shape regression: YiGamma(αi,β)Y_i\sim\mathrm{Gamma}(\alpha_i, \beta), with αi=er(Xi)\alpha_i = e^{r(X_i)}.

In all scenarios, the user need only provide logfη\log f_\eta, its gradient, and optionally the Fisher information. Implementation reuses the same RJMCMC-backfitting routine, with posterior sampling over r(x)r(x) and global/nuisance parameters proceeding as with standard BART (Linero, 2022).

7. Significance and Broader Impact

By removing the conditional conjugacy restriction, RJMCMC-based Bayesian backfitting enables BART to be used in an extensive range of applications—beyond what classic data augmentation or tailored MCMC methods can support. This approach preserves the interpretability and flexible uncertainty quantification of tree ensembles, while ensuring practical applicability to contemporary statistical models in regression, survival analysis, and structured heteroskedasticity (Linero, 2022). A plausible implication is that further generalizations to other nonparametric ensemble priors may be straightforward, given only evaluability of the requisite likelihood and its derivatives.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Bayesian Backfitting Markov Chain Monte Carlo (MCMC).