Time-Robust Minimax Rates

Updated 6 February 2026

Time-robust minimax rates are provably optimal rates of estimation in online, sequential models that adjust to dynamic data and evolving partition geometry.
They are achieved by balancing bias and variance through recursive, data-driven partitioning methods in nonparametric regression and classification tasks, exemplified by Mondrian Forests.
Adaptive and oblique forest variants enhance performance by tuning splitting rules and employing debiasing techniques, ensuring robust estimation under covariate shifts and higher-order smoothness.

Time-robust minimax rates are the provably optimal rates of statistical estimation or prediction in structured learning environments, notably under nonparametric regression or classification models, that remain valid (or are achievable) by estimators constructed under explicit time-evolution, online, or recursive constraints. The concept is deeply connected to the construction and analysis of sequentially or recursively built tree ensembles such as Mondrian Forests, random tessellation forests, and their axis-aligned or oblique generalizations, for which the tuning and theoretical performance guarantees must account for both sample size and the evolving partition geometry over time.

1. Minimax Rates: Definition and Classical Results

The minimax rate for nonparametric regression with regression function $f$ in a smoothness class $\mathcal{C}^s([0,1]^d)$ is the rate at which the worst-case mean squared error (MSE), over all estimators and all functions in the class, decays with the sample size $n$ . For Hölder or Lipschitz classes ( $s$ -smoothness), the classical rate is

$\inf_{\hat f_n}\sup_{f\in \mathcal{C}^s(L)} \mathbb{E}[(\hat f_n(X) - f(X))^2] = \Theta(n^{-2s/(2s+d)})$

for $X \in [0,1]^d$ and standard noise settings.

These rates are known to be optimal: no estimator can achieve better asymptotic scaling without additional structural assumptions (Mourtada et al., 2017, Mourtada et al., 2018).

2. Time-Consistency and Online Algorithms

A distinguishing feature of time-robust minimax rates is their attainability in fully online or sequential algorithms. For example, the Mondrian forest, constructed as an online extension of random forests, guarantees minimax-optimal $n^{-2/(d+2)}$ rates for Lipschitz regression even as the model is updated pointwise with data observed in sequence, provided the so-called lifetime parameter $\lambda_n$ is increased appropriately: $\lambda_n \asymp n^{1/(d+2)}$ (Mourtada et al., 2017).

The online property is achieved by leveraging the memoryless property of exponential waiting times in the recursive partitioning process, enabling seamless extension of the partition as new data arrives and ensuring universal consistency and minimax optimality are preserved through time (Mourtada et al., 2017).

3. Purely Random Forests: Bias-Variance Decomposition and Rate Optimality

Purely random forests—ensembles of trees built without data-dependent splits—serve as canonical models for studying minimax rates under explicit randomness and minimal adaptivity. In the simplest one-dimensional case, the purely uniformly random forest (PURF) achieves the minimax rate $n^{-2/3}$ for Lipschitz regression (Genuer, 2010). For $d$ -dimensional balanced purely random forests, the scaling is generally suboptimal unless additional control over partition balance is imposed (Arlot et al., 2014, Neumeyer et al., 17 Nov 2025).

The generic bias–variance decomposition in these models yields:

$\text{Bias}^2 \sim \mathbb{E}[\text{diam(leaf)}^2] \sim \lambda^{-2}$ (partition scale $\lambda$ ),
$\text{Variance} \sim \lambda^d/n$ .

Balancing these terms by tuning $\lambda$ as a function of $n$ achieves the minimax rate (Mourtada et al., 2017, Mourtada et al., 2018).

4. Adaptive and Data-Driven Forests

Recent advances include data-dependent forests such as adaptive split balancing forests (ASBF), which employ permutation-based or cyclical splitting rules to enforce approximately equal splitting across all coordinates. This mitigates the “skinny cell” phenomenon and ensures that the mean squared diameter of leaves decays as fast as the ideal rate $2^{-2M/d}$ , enabling minimax IMSE rates $n^{-2/(d+2)}$ for Lipschitz functions and $n^{-2(q+\beta)/(d+2(q+\beta))}$ for any Hölder smoothness $q+\beta>0$ (Zhang et al., 2024, Neumeyer et al., 17 Nov 2025).

A comparison of forest types and their associated minimax rates is provided below:

Forest model	Achievable minimax rate	Conditions for optimality
Online Mondrian forest	$n^{-2/(d+2)}$	$L$ -Lipschitz, $\lambda_n\asymp n^{1/(d+2)}$
STIT/Poisson tessellation forests	$n^{-2s/(d+2s)}$	$C^s$ -Hölder, properly tuned $\lambda$
Adaptive Split Balancing Forest	$n^{-2s/(d+2s)}$	Split balance, leafwise polynomial fit
Purely Uniformly Random Forest	$n^{-2/3}$ (for $d=1$ )	1D, Lipschitz
Ehrenfest Centered PRF (multi-dim.)	$n^{-2\alpha/(2\alpha+p)}$	Enforced balance on splits

5. Oblique and Tessellation Forests: Intrinsic Dimension and Minimax Adaptivity

Time-robust minimax rates can be realized by random forest models that exploit general (oblique) split directions. Random tessellation forests—such as STIT forests and Poisson hyperplane forests—enable minimax-optimal rates $n^{-2s/(d+2s)}$ in arbitrary dimension and further adapt to the intrinsic dimension of the support of the data, i.e., $d$ can be replaced by the effective dimension of the manifold on which the covariates lie (O'Reilly et al., 2021, O'Reilly, 2024).

Oblique Mondrian forests, which split along carefully chosen linear combinations of covariates, can further adapt to the dimension $s \ll d$ of multi-index/ridge function models, matches the minimax rate $n^{-2\beta/(s+2\beta)}$ for $C^{0,\beta}$ and $n^{-(2+2\beta)/(s+2+2\beta)}$ for $C^{1,\beta}$ , provided the chosen split directions adequately capture the active subspace (O'Reilly, 2024). Axis-aligned Mondrian forests, by contrast, cannot break the ambient dimension curse and are suboptimal for such models.

6. Minimax Adaptivity Under Structured Data: Clustering and Covariate Shift

Clustered random forests (CRF) demonstrate that the minimax $n^{-2/(d+2)}$ rate for Lipschitz mean estimation persists even with within-cluster dependence, provided that an appropriately weighted least squares estimator is used in the terminal nodes and the splitting mechanism enforces symmetry and balance (Young et al., 16 Mar 2025). The optimal weighting under covariate shift must be chosen with reference to the target distribution, but the minimax exponent is unaffected—the optimality is robust to both dependence and distributional changes, up to multiplicative constants.

7. Debiasing and Higher-Order Adaptation

For regression functions of order $\beta>2$ , debiased Mondrian random forests—linear combinations of ensembles at multiple partition scales, with weights obtained via generalized jackknife or Vandermonde inversion—can achieve higher-order minimax rates, e.g., $n^{-2\beta/(d+2\beta)}$ for $C^\beta$ -smooth functions. This is not attainable by naive partitioning without debiasing, as higher smoothness manifests in smaller leading bias terms only after cancellation of lower-order components (Cattaneo et al., 2023).

References

Universal consistency and minimax rates for online Mondrian Forests (Mourtada et al., 2017)
Minimax Rates for High-Dimensional Random Tessellation Forests (O'Reilly et al., 2021)
Minimax optimal rates for Mondrian trees and forests (Mourtada et al., 2018)
Risk bounds for purely uniformly random forests (Genuer, 2010)
Analysis of purely random forests bias (Arlot et al., 2014)
Adaptive Split Balancing for Optimal Random Forest (Zhang et al., 2024)
Statistical Advantages of Oblique Randomized Decision Trees and Forests (O'Reilly, 2024)
Clustered random forests with correlated data for optimal estimation and inference under potential covariate shift (Young et al., 16 Mar 2025)
Inference with Mondrian Random Forests (Cattaneo et al., 2023)
Asymptotic confidence bands for centered purely random forests (Neumeyer et al., 17 Nov 2025)