Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stable Update of Regression Trees (2402.13655v1)

Published 21 Feb 2024 in cs.LG

Abstract: Updating machine learning models with new information usually improves their predictive performance, yet, in many applications, it is also desirable to avoid changing the model predictions too much. This property is called stability. In most cases when stability matters, so does explainability. We therefore focus on the stability of an inherently explainable machine learning method, namely regression trees. We aim to use the notion of empirical stability and design algorithms for updating regression trees that provide a way to balance between predictability and empirical stability. To achieve this, we propose a regularization method, where data points are weighted based on the uncertainty in the initial model. The balance between predictability and empirical stability can be adjusted through hyperparameters. This regularization method is evaluated in terms of loss and stability and assessed on a broad range of data characteristics. The results show that the proposed update method improves stability while achieving similar or better predictive performance. This shows that it is possible to achieve both predictive and stable results when updating regression trees.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Morten Blørstad. Improving Stability of Tree-Based Models. Master’s thesis, University of Bergen, Bergen, 2023.
  2. Stability and Generalization. J. Mach. Learn. Res., 2:499–526, 3 2002. ISSN 1532-4435. doi: 10.1162/153244302760200704. URL https://doi.org/10.1162/153244302760200704.
  3. Classification and Regression Trees. New York, 1st edition edition, 1984. doi: https://doi.org/10.1201/9781315139470.
  4. Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996. ISSN 1573-0565. doi: 10.1007/BF00058655. URL https://doi.org/10.1007/BF00058655.
  5. A similarity measure to assess the stability of classification trees. Computational Statistics & Data Analysis, 53(4):1208–1217, 2009. ISSN 0167-9473. doi: https://doi.org/10.1016/j.csda.2008.10.033. URL https://www.sciencedirect.com/science/article/pii/S0167947308004970.
  6. Model Selection and Multimodel Inference. Springer New York, New York, NY, 2004. ISBN 978-0-387-95364-9. doi: 10.1007/b97636.
  7. XGBoost: A Scalable Tree Boosting System. arXiv, 3 2016. doi: 10.1145/2939672.2939785. URL https://arxiv.org/abs/1603.02754.
  8. Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals, 2018.
  9. A Theory of the Term Structure of Interest Rates. Econometrica, 53(2):385–407, 1985. ISSN 00129682, 14680262. doi: 10.2307/1911242. URL http://www.jstor.org/stable/1911242.
  10. Felix Dannegger. Tree stability diagnostics and some remedies for instability. Statistics in medicine, 19:475–491, 1 2000. doi: 10.1002/(SICI)1097-0258(20000229)19:43.0.CO;2-V.
  11. L Devroye and T Wagner. Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory, 25(5):601–604, 1979. doi: 10.1109/TIT.1979.1056087.
  12. Satisfying Real-world Goals with Dataset Constraints. In D D Lee, M Sugiyama, U V Luxburg, I. Guyon, and R Garnett (eds.), Advances in Neural Information Processing Systems 29, pp.  2415–2423. 2016.
  13. The Elements of Statistical Learning. Springer New York, New York, NY, 2001. ISBN 978-0-387-84857-0. doi: 10.1007/978-0-387-84858-7.
  14. Peter J Huber. The behavior of maximum likelihood estimates under nonstandard conditions. 1967. URL https://api.semanticscholar.org/CorpusID:123642812.
  15. ISLR2. Technical report, packaged 2022-11-20, 2022.
  16. Algorithmic impact assessments under the GDPR: producing multi-layered explanations. International Data Privacy Law, 11(2):125–144, 4 2021. ISSN 2044-3994. doi: 10.1093/idpl/ipaa020. URL https://doi.org/10.1093/idpl/ipaa020.
  17. Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation. Neural Computation, 11(6):1427–1453, 7 1999. ISSN 0899-7667. doi: 10.1162/089976699300016304. URL https://doi.org/10.1162/089976699300016304.
  18. R Kelley Pace and Ronald Barry. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291–297, 1997. ISSN 0167-7152. doi: https://doi.org/10.1016/S0167-7152(96)00140-X. URL https://www.sciencedirect.com/science/article/pii/S016771529600140X.
  19. Improving Stability of Decision Trees. International Journal of Pattern Recognition and Artificial Intelligence, 16(02):145–159, 2002. doi: 10.1142/S0218001402001599. URL https://doi.org/10.1142/S0218001402001599.
  20. Chinghway Lim and B Yu. Estimation Stability with Cross Validation (ESCV). Journal of Computational and Graphical Statistics, 25, 1 2013. doi: 10.1080/10618600.2015.1020159.
  21. Statistical Methods in Assessing Agreement. Journal of the American Statistical Association, 97(457):257–270, 2002. doi: 10.1198/016214502753479392. URL https://doi.org/10.1198/016214502753479392.
  22. Model Stability with Continuous Data Updates, 2 2022.
  23. An information criterion for automatic gradient tree boosting. arXiv, 8 2020. doi: 10.48550/arxiv.2008.05926. URL https://arxiv.org/abs/2008.05926.
  24. Launch and Iterate: Reducing Prediction Churn. In D Lee, M Sugiyama, U Luxburg, I Guyon, and R Garnett (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/file/dc5c768b5dc76a084531934b34601977-Paper.pdf.
  25. Measuring the Stability of Results From Supervised Statistical Learning. Journal of Computational and Graphical Statistics, 131, 1 2017. doi: 10.1080/10618600.2018.1473779.
  26. K Takeuchi. Distribution of information statistics and validity criteria of models. Mathematical Science, 153:12–18, 5 1976.
  27. Peter Turney. Technical Note: Bias and the Quantification of Stability. Machine Learning, 20:23–33, 7 1995. doi: 10.1007/BF00993473.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets