Random forest theory for dependent data

Develop theoretical extensions of Breiman’s classical random forest algorithm to dependent data-generating processes, and prove consistency and associated performance guarantees under dependence structures (e.g., temporal or other forms of dependence).

Background

The paper highlights that beyond the i.i.d. setting, handling dependence in random forest theory remains insufficiently understood. Although there is emerging work for time-dependent processes, a general framework and guarantees for dependent data are not yet established.

Addressing dependence is critical for applications involving time series, clustered sampling, or complex designs, where independence assumptions are violated and theoretical assurances are necessary for valid inference.

References

Moreover, many questions remain open, for instance regarding finite-sample guarantees or extensions to dependent data.

Distributional Random Forests for Complex Survey Designs on Reproducing Kernel Hilbert Spaces  (2512.08179 - Zou et al., 9 Dec 2025) in Section 1 (Introduction)