Papers
Topics
Authors
Recent
Search
2000 character limit reached

Time-Robust Minimax Rates

Updated 6 February 2026
  • Time-robust minimax rates are provably optimal rates of estimation in online, sequential models that adjust to dynamic data and evolving partition geometry.
  • They are achieved by balancing bias and variance through recursive, data-driven partitioning methods in nonparametric regression and classification tasks, exemplified by Mondrian Forests.
  • Adaptive and oblique forest variants enhance performance by tuning splitting rules and employing debiasing techniques, ensuring robust estimation under covariate shifts and higher-order smoothness.

Time-robust minimax rates are the provably optimal rates of statistical estimation or prediction in structured learning environments, notably under nonparametric regression or classification models, that remain valid (or are achievable) by estimators constructed under explicit time-evolution, online, or recursive constraints. The concept is deeply connected to the construction and analysis of sequentially or recursively built tree ensembles such as Mondrian Forests, random tessellation forests, and their axis-aligned or oblique generalizations, for which the tuning and theoretical performance guarantees must account for both sample size and the evolving partition geometry over time.

1. Minimax Rates: Definition and Classical Results

The minimax rate for nonparametric regression with regression function ff in a smoothness class Cs([0,1]d)\mathcal{C}^s([0,1]^d) is the rate at which the worst-case mean squared error (MSE), over all estimators and all functions in the class, decays with the sample size nn. For Hölder or Lipschitz classes (ss-smoothness), the classical rate is

inff^nsupfCs(L)E[(f^n(X)f(X))2]=Θ(n2s/(2s+d))\inf_{\hat f_n}\sup_{f\in \mathcal{C}^s(L)} \mathbb{E}[(\hat f_n(X) - f(X))^2] = \Theta(n^{-2s/(2s+d)})

for X[0,1]dX \in [0,1]^d and standard noise settings.

These rates are known to be optimal: no estimator can achieve better asymptotic scaling without additional structural assumptions (Mourtada et al., 2017, Mourtada et al., 2018).

2. Time-Consistency and Online Algorithms

A distinguishing feature of time-robust minimax rates is their attainability in fully online or sequential algorithms. For example, the Mondrian forest, constructed as an online extension of random forests, guarantees minimax-optimal n2/(d+2)n^{-2/(d+2)} rates for Lipschitz regression even as the model is updated pointwise with data observed in sequence, provided the so-called lifetime parameter λn\lambda_n is increased appropriately: λnn1/(d+2)\lambda_n \asymp n^{1/(d+2)} (Mourtada et al., 2017).

The online property is achieved by leveraging the memoryless property of exponential waiting times in the recursive partitioning process, enabling seamless extension of the partition as new data arrives and ensuring universal consistency and minimax optimality are preserved through time (Mourtada et al., 2017).

3. Purely Random Forests: Bias-Variance Decomposition and Rate Optimality

Purely random forests—ensembles of trees built without data-dependent splits—serve as canonical models for studying minimax rates under explicit randomness and minimal adaptivity. In the simplest one-dimensional case, the purely uniformly random forest (PURF) achieves the minimax rate n2/3n^{-2/3} for Lipschitz regression (Genuer, 2010). For dd-dimensional balanced purely random forests, the scaling is generally suboptimal unless additional control over partition balance is imposed (Arlot et al., 2014, Neumeyer et al., 17 Nov 2025).

The generic bias–variance decomposition in these models yields:

  • Bias2E[diam(leaf)2]λ2\text{Bias}^2 \sim \mathbb{E}[\text{diam(leaf)}^2] \sim \lambda^{-2} (partition scale λ\lambda),
  • Varianceλd/n\text{Variance} \sim \lambda^d/n.

Balancing these terms by tuning λ\lambda as a function of nn achieves the minimax rate (Mourtada et al., 2017, Mourtada et al., 2018).

4. Adaptive and Data-Driven Forests

Recent advances include data-dependent forests such as adaptive split balancing forests (ASBF), which employ permutation-based or cyclical splitting rules to enforce approximately equal splitting across all coordinates. This mitigates the “skinny cell” phenomenon and ensures that the mean squared diameter of leaves decays as fast as the ideal rate 22M/d2^{-2M/d}, enabling minimax IMSE rates n2/(d+2)n^{-2/(d+2)} for Lipschitz functions and n2(q+β)/(d+2(q+β))n^{-2(q+\beta)/(d+2(q+\beta))} for any Hölder smoothness q+β>0q+\beta>0 (Zhang et al., 2024, Neumeyer et al., 17 Nov 2025).

A comparison of forest types and their associated minimax rates is provided below:

Forest model Achievable minimax rate Conditions for optimality
Online Mondrian forest n2/(d+2)n^{-2/(d+2)} LL-Lipschitz, λnn1/(d+2)\lambda_n\asymp n^{1/(d+2)}
STIT/Poisson tessellation forests n2s/(d+2s)n^{-2s/(d+2s)} CsC^s-Hölder, properly tuned λ\lambda
Adaptive Split Balancing Forest n2s/(d+2s)n^{-2s/(d+2s)} Split balance, leafwise polynomial fit
Purely Uniformly Random Forest n2/3n^{-2/3} (for d=1d=1) 1D, Lipschitz
Ehrenfest Centered PRF (multi-dim.) n2α/(2α+p)n^{-2\alpha/(2\alpha+p)} Enforced balance on splits

5. Oblique and Tessellation Forests: Intrinsic Dimension and Minimax Adaptivity

Time-robust minimax rates can be realized by random forest models that exploit general (oblique) split directions. Random tessellation forests—such as STIT forests and Poisson hyperplane forests—enable minimax-optimal rates n2s/(d+2s)n^{-2s/(d+2s)} in arbitrary dimension and further adapt to the intrinsic dimension of the support of the data, i.e., dd can be replaced by the effective dimension of the manifold on which the covariates lie (O'Reilly et al., 2021, O'Reilly, 2024).

Oblique Mondrian forests, which split along carefully chosen linear combinations of covariates, can further adapt to the dimension sds \ll d of multi-index/ridge function models, matches the minimax rate n2β/(s+2β)n^{-2\beta/(s+2\beta)} for C0,βC^{0,\beta} and n(2+2β)/(s+2+2β)n^{-(2+2\beta)/(s+2+2\beta)} for C1,βC^{1,\beta}, provided the chosen split directions adequately capture the active subspace (O'Reilly, 2024). Axis-aligned Mondrian forests, by contrast, cannot break the ambient dimension curse and are suboptimal for such models.

6. Minimax Adaptivity Under Structured Data: Clustering and Covariate Shift

Clustered random forests (CRF) demonstrate that the minimax n2/(d+2)n^{-2/(d+2)} rate for Lipschitz mean estimation persists even with within-cluster dependence, provided that an appropriately weighted least squares estimator is used in the terminal nodes and the splitting mechanism enforces symmetry and balance (Young et al., 16 Mar 2025). The optimal weighting under covariate shift must be chosen with reference to the target distribution, but the minimax exponent is unaffected—the optimality is robust to both dependence and distributional changes, up to multiplicative constants.

7. Debiasing and Higher-Order Adaptation

For regression functions of order β>2\beta>2, debiased Mondrian random forests—linear combinations of ensembles at multiple partition scales, with weights obtained via generalized jackknife or Vandermonde inversion—can achieve higher-order minimax rates, e.g., n2β/(d+2β)n^{-2\beta/(d+2\beta)} for CβC^\beta-smooth functions. This is not attainable by naive partitioning without debiasing, as higher smoothness manifests in smaller leading bias terms only after cancellation of lower-order components (Cattaneo et al., 2023).

References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Time-Robust Minimax Rates.