Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods (2202.00858v1)

Published 2 Feb 2022 in cs.LG, cs.AI, stat.AP, stat.ME, and stat.ML

Abstract: Tree-based models such as decision trees and random forests (RF) are a cornerstone of modern machine-learning practice. To mitigate overfitting, trees are typically regularized by a variety of techniques that modify their structure (e.g. pruning). We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure, and instead regularizes the tree by shrinking the prediction over each node towards the sample means of its ancestors. The amount of shrinkage is controlled by a single regularization parameter and the number of data points in each ancestor. Since HS is a post-hoc method, it is extremely fast, compatible with any tree growing algorithm, and can be used synergistically with other regularization techniques. Extensive experiments over a wide variety of real-world datasets show that HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques. Moreover, we find that applying HS to each tree in an RF often improves accuracy, as well as its interpretability by simplifying and stabilizing its decision boundaries and SHAP values. We further explain the success of HS in improving prediction performance by showing its equivalence to ridge regression on a (supervised) basis constructed of decision stumps associated with the internal nodes of a tree. All code and models are released in a full-fledged package available on Github (github.com/csinva/imodels)

Citations (25)

View on Semantic Scholar

Summary

The paper introduces Hierarchical Shrinkage, a post-hoc regularization method that significantly reduces overfitting in tree-based models.
The method recalibrates node predictions using ancestor means in a ridge regression framework, enhancing model interpretability without altering tree structures.
Numerical experiments show that Hierarchical Shrinkage boosts accuracy by 6–10%, outperforming traditional CART and uniform shrinkage approaches.

Hierarchical Shrinkage: Improving Tree-Based Methods

The paper presents an innovative approach called Hierarchical Shrinkage (HS) to enhance the performance and interpretability of tree-based models like decision trees and random forests (RF). Unlike traditional methods that modify tree structures to prevent overfitting, HS acts post-hoc without structural alterations, making it a versatile and computationally efficient solution.

Methodology and Implementation

HS regularizes predictions by shrinking node predictions towards the mean of their ancestors, controlled by a single regularization parameter. This parameter, alongside the count of data points at each ancestor node, dictates the level of shrinkage. As a result, HS is compatible with any tree-growing algorithm and can complement existing regularization techniques, enhancing performance without added structural complexity.

The method's core strength lies in its simplicity and flexibility. Once a tree model is constructed, HS refines predictions by recalibrating the mean response over each tree leaf with a weighted average that accounts for ancestor means. This treatment reduces generalization error significantly, observed across various real-world datasets.

The equivalence between HS and ridge regression is a crucial theoretical underpinning. HS can be interpreted as ridge regression applied to a supervised basis consisting of decision stumps linked to the tree's nodes. This ridge regression formulation provides robust statistical groundwork for HS, offering a measure of regularization that addresses overfitting effectively.

Numerical Results and Observations

Extensive experimentation confirms that HS consistently enhances predictive accuracy for decision trees, even when other regularization mechanisms are applied. When extended to RF, HS not only boosts accuracy but also improves interpretability by simplifying and stabilizing decision boundaries and SHAP values.

In classification and regression tasks, HS demonstrated substantial improvements in performance metrics like AUC and R². Particularly, the introduction of HS resulted in a 6.2% to 10% increase in performance over traditional CART-based methods, reinforcing the efficacy of HS as a universal enhancer of tree-based models.

Comparative analysis with leaf-based shrinkage (LBS), part of XGBoost, highlighted HS's superior performance. This advantage is attributed to HS's hierarchical, node-specific shrinkage strategy, in contrast to the LBS's uniform approach. Furthermore, HS outperformed explicit regularization methods like RF's depth and mtry parameters, underscoring the method's flexibility and its ability to improve model generalization with fewer computational resources.

Implications for AI and Future Directions

HS's impact extends to the interpretability of ensemble models. By stabilizing feature importance measures such as SHAP scores, HS not only improves model predictions but also their reliability and clarity for decision-making. This is particularly valuable in sensitive fields like healthcare and criminal justice, where interpretability is as crucial as accuracy.

The theoretical exploration linking HS to ridge regression illuminates pathways for advancing tree-based methods through enhanced regularization techniques. Future research may delve into adopting alternative shrinkage paradigms such as lasso, potentially broadening the applicability of HS in diverse data landscapes.

Finally, the integration of HS into existing software frameworks further democratizes access to robust tree-based modeling, enabling practitioners and researchers to leverage its benefits without extensive overheads.

In summary, Hierarchical Shrinkage emerges as a potent method that not only fortifies predictive performance but also enhances the interpretability of tree-based models. This paper contributes a significant theoretical and practical advancement in the regularization of decision trees and their ensembles, posing new questions and opportunities for future research in machine learning.

PDF Markdown