To tune or not to tune the number of trees in random forest?

Published 16 May 2017 in stat.ML and cs.LG | (1705.05654v1)

Abstract: The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user. It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better. While the principle underlying bagging is that "more trees are better", in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees. The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared error (for regression); (iii) illustrating the extent of the problem through an application to a large number (n = 306) of datasets from the public database OpenML; (iv) finally arguing in favor of setting it to a computationally feasible large number, depending on convergence properties of the desired performance measure.

Abstract PDF Upgrade to Chat

Citations (356)

View on Semantic Scholar

Summary

The paper demonstrates that increasing the number of trees does not always enhance classification accuracy, as non-monotonic error curves may occur in binary settings.
It employs both theoretical analysis and empirical evaluation on 306 datasets to rigorously assess the impact of tree count on performance.
The authors conclude that, despite occasional deviations, setting trees as high as computationally feasible is preferred over extensive tuning, emphasizing convergence of refined error measures.

Overview of "To tune or not to tune the number of trees in random forest?"

The paper by Probst and Boulesteix critically examines an essential parameter in the Random Forest (RF) algorithm: the number of trees (T). It challenges the prevalent belief that more trees invariably lead to better performance, pointing out the existence of cases where increasing T does not enhance—or may even degrade—classification accuracy. The authors systematically approach this problem through both theoretical analysis and empirical study.

Key Theoretical Insights

The paper establishes a theoretical foundation showing that the expected error rate in a Random Forest is not necessarily a monotonous function of T. This non-monotonicity primarily affects the classical error rate in binary classification, suggesting that simply maximizing the number of trees is not always the optimal strategy. The research demonstrates that the non-monotonic nature is less of a concern for other performance measures like the Brier score, logarithmic loss, and mean squared error in regression settings, which tend to be strictly monotonic and decrease with an increasing number of trees.

Empirical Evaluation

The validity of the theoretical assertions is tested across 306 datasets obtained from the OpenML platform. The empirical results corroborate that non-monotonous patterns in error rate curves can and do occur; approximately 10% of the datasets reveal non-monotonous behavior in their OOB error rate curves. Notably, datasets exhibiting such behavior often have specific combinations of observation-specific error rates ( $\varepsilon_i$ ) close to but greater than 0.5.

Implications and Practice

Despite occasional non-monotonic error rate patterns, the authors argue against tuning T based solely on these findings. They suggest setting T as large as computationally feasible, given the convergence properties of the performance measure. This stance is supported by the observation that deviations in performance are often minor and that larger forests generally offer better robustness and stability, particularly when measures other than the raw error rate are considered. From a practical standpoint, the research indicates that focusing on more refined performance measures and ensuring the convergence of these values is beneficial.

Broader Considerations

The paper highlights a pivotal issue in ensemble learning regarding parameter selection and model optimization. It suggests that non-monotonic error rate patterns should not trigger panicked tuning practices, as they can emerge from ordinary statistical variability rather than indicate underlying problems with the model's performance. Future work could explore if similar results extend beyond the RF algorithm to other ensemble methods that utilize bagging.

Conclusion

Probst and Boulesteix provide a comprehensive examination of the importance of tree number selection in Random Forests, integrating theoretical perspectives with substantial empirical data. While the findings invite caution, particularly concerning the habituation of tuning every parameter, they robustly favor a conventional approach to configuring Random Forests—optimized primarily by maximizing the number of trees where feasible. This study underscores the need for a deeper understanding of model complexity and the judicious use of computational resources as theoretical and empirical evidence shape advanced machine learning practices.

Markdown