Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bayesian Dynamic Trees

Updated 16 April 2026
  • Bayesian Dynamic Trees are a fully Bayesian, nonparametric framework that partitions the input space into axis-aligned regions with simple parametric models in each leaf.
  • They use sequential Monte Carlo for online inference, incorporating active data retirement and a forgetting mechanism to manage memory in streaming settings.
  • The approach adaptively handles non-stationarity and achieves competitive performance in regression and classification tasks with efficient, bounded computational complexity.

Bayesian Dynamic Trees (DTs) constitute a fully Bayesian, non-parametric modeling framework suitable for streaming and massive data settings. DTs maintain piecewise simple parametric models on axis-aligned partitioned subspaces while employing sequential Monte Carlo (SMC) for online inference. Key techniques include active data retirement with conjugate updating and explicit forgetting mechanisms, resulting in bounded memory and computational complexity per data point. DTs achieve adaptivity to non-stationarity and remain competitive with state-of-the-art streaming algorithms in both regression and classification tasks (Anagnostopoulos et al., 2012).

1. Model Structure

Static treed models partition the input space XRpX \subset \mathbb{R}^p into axis-aligned hyperrectangles (“leaves”) via recursive binary splits of the form xjcx_j \geq c. A tree TT consists of internal nodes ITI_T and leaves LT\mathcal{L}_T; for every xXx \in X, η(x)LT\eta(x) \in \mathcal{L}_T denotes the unique leaf containing xx.

Within each leaf η\eta, a simple parametric model is fitted:

  • Regression: yx,TN(βηTx+μη,ση2)y|x,T \sim \mathcal{N}(\beta_\eta^T x + \mu_\eta, \sigma^2_\eta), with a noninformative prior xjcx_j \geq c0.
  • Classification: xjcx_j \geq c1, with prior xjcx_j \geq c2.

The Bayesian prior on tree structures is a recursive split probability: for each leaf xjcx_j \geq c3 at depth xjcx_j \geq c4, xjcx_j \geq c5, with xjcx_j \geq c6. The prior on xjcx_j \geq c7 is then xjcx_j \geq c8.

Dynamic operation is defined by local “grow”, “prune”, or “stay” moves triggered only at the leaf xjcx_j \geq c9 where the new datum TT0 resides. Split dimension and cut-point for a “grow” move are chosen uniformly over available dimensions and the observed range within the current leaf.

2. Bayesian Formulation

The joint Bayesian formulation is specified by independent priors across leaves, as outlined above. The full data likelihood for TT1 samples is

TT2

Online posterior updates within an SMC framework update the particle weight for each tree TT3 at time TT4 as

TT5

3. Streaming Inference: Data Retirement and Forgetting

3.1 SMC Operation

A population of TT6 particles TT7 is maintained, where TT8 holds leaf-level sufficient statistics. On receiving TT9,

  • Weight each particle by predictive density at ITI_T0.
  • Resample particles in proportion to these weights.
  • Propagate by performing a random local move (grow/prune/stay) at the affected leaf.
  • Update sufficient statistics in the relevant leaf by including ITI_T1.

3.2 Data Retirement (Active Discarding)

Each leaf maintains at most ITI_T2 active datapoints. When the active set exceeds ITI_T3, an active point ITI_T4 is retired. The prior for the corresponding leaf is updated with the retired data point:

  • Regression leaves update Normal-Inverse-Gamma sufficient statistics via ITI_T5, ITI_T6, ITI_T7, ITI_T8.
  • Classification leaves update Dirichlet counts: ITI_T9. Retirement updates preserve exact marginal likelihoods and posterior predictives in each leaf.

3.3 Forgetting Mechanism

To enable temporal adaptivity, a “forgetting factor” LT\mathcal{L}_T0 is used, applying LT\mathcal{L}_T1 (and analogous updates for LT\mathcal{L}_T2). As LT\mathcal{L}_T3, full-memory is retained; as LT\mathcal{L}_T4, only the most recent observations contribute. This adaptation allows DTs to track changes in nonstationary environments.

4. High-Level Pseudo-Code Description

The online DT algorithm can be summarized as follows:

xXx \in X9

5. Computational Complexity

Memory usage is LT\mathcal{L}_T5 for active datapoints and LT\mathcal{L}_T6 for tree structures; under constant LT\mathcal{L}_T7, total memory is LT\mathcal{L}_T8. Time complexity per data point:

  • Weight computation: LT\mathcal{L}_T9 (xXx \in X0 = leaf-model dimension)
  • Resampling: xXx \in X1
  • Propagation (local move): xXx \in X2 for split selection; xXx \in X3 for structural change
  • Retirement update: xXx \in X4 amortized per active set Overall per datum cost is xXx \in X5, independent of the cumulative sample size.

6. Empirical Performance Summary

DTs were benchmarked on both synthetic and real-world datasets:

  • Regression (Friedman, xXx \in X6): Keeping xXx \in X7 active points, active learning criterion (ALC) based retiring nearly matches full-data DT performance in RMSE and predictive log-density at roughly 1/10 of the memory.
  • Classification (Spambase: xXx \in X8): With $w
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayesian Dynamic Trees.