Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Era Splitting: Invariant Learning for Decision Trees (2309.14496v5)

Published 25 Sep 2023 in cs.LG, cs.AI, and cs.CE

Abstract: Real-life machine learning problems exhibit distributional shifts in the data from one time to another or from one place to another. This behavior is beyond the scope of the traditional empirical risk minimization paradigm, which assumes i.i.d. distribution of data over time and across locations. The emerging field of out-of-distribution (OOD) generalization addresses this reality with new theory and algorithms which incorporate "environmental", or "era-wise" information into the algorithms. So far, most research has been focused on linear models and/or neural networks . In this research we develop two new splitting criteria for decision trees, which allow us to apply ideas from OOD generalization research to decision tree models, namely, gradient boosting decision trees (GBDTs). The new splitting criteria use era-wise information associated with the data to grow tree-based models that are optimal across all disjoint eras in the data, instead of optimal over the entire data set pooled together, which is the default setting. In this paper, two new splitting criteria are defined and analyzed theoretically. Effectiveness is tested on four experiments, ranging from simple, synthetic to complex, real-world applications. In particular we cast the OOD domain-adaptation problem in the context of financial markets, where the new models out-perform state-of-the-art GBDT models on the Numerai data set. The new criteria are incorporated into the Scikit-Learn code base and made freely available online.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “Invariant Risk Minimization”, 2020 arXiv:1907.02893 [stat.ML]
  2. Kavosh Asadi and Michael L. Littman “An Alternative Softmax Operator for Reinforcement Learning” In Proceedings of the 34th International Conference on Machine Learning 70, Proceedings of Machine Learning Research PMLR, 2017, pp. 243–252 URL: https://proceedings.mlr.press/v70/asadi17a.html
  3. “From detection of individual metastases to classification of lymph node status at the patient level: the CAMELYON17 challenge” In IEEE Transactions on Medical Imaging IEEE, 2018
  4. John Blin, John Guerard and Andrew Mark “A History of Commercially Available Risk Models” In Encyclopedia of Finance, Springer Books Springer, 2022, pp. 2275–2311 DOI: 10.1007/978-3-030-91231-4
  5. Leo Breiman “Random Forests” In Machine Learning 45.1 Kluwer Academic Publishers, 2001, pp. 5–32 DOI: 10.1023/A:1010933404324
  6. “XGBoost: A Scalable Tree Boosting System” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16 ACM, 2016 DOI: 10.1145/2939672.2939785
  7. Eugene F. Fama and Kenneth R. French “The Capital Asset Pricing Model: Theory and Evidence” In Journal of Economic Perspectives 18.3, 2004, pp. 25–46 DOI: 10.1257/0895330042162430
  8. “Machine learning techniques for cross-sectional equity returns’ prediction” In OR Spectrum 45.1, 2023, pp. 289–323 DOI: 10.1007/s00291-022-00693-w
  9. Jerome Friedman, Trevor Hastie and Robert Tibshirani “Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)” In The Annals of Statistics 28.2 Institute of Mathematical Statistics, 2000, pp. 337–407 DOI: 10.1214/aos/1016218223
  10. Jerome H. Friedman “Greedy function approximation: A gradient boosting machine.” In The Annals of Statistics 29.5 Institute of Mathematical Statistics, 2001 DOI: 10.1214/aos/1013203451
  11. Léo Grinsztajn, Edouard Oyallon and Gaël Varoquaux “Why do tree-based models still outperform deep learning on tabular data?”, 2022 arXiv:2207.08815 [cs.LG]
  12. “LightGBM: A Highly Efficient Gradient Boosting Decision Tree” In Advances in Neural Information Processing Systems 30 Curran Associates, Inc., 2017
  13. “WILDS: A Benchmark of in-the-Wild Distribution Shifts” In International Conference on Machine Learning (ICML), 2021
  14. “Towards Out-Of-Distribution Generalization: A Survey”, 2023 arXiv:2108.13624 [cs.LG]
  15. Harry Markowitz “PORTFOLIO SELECTION*” In The Journal of Finance 7.1, 1952, pp. 77–91
  16. Vaishnavh Nagarajan, Anders Andreassen and Behnam Neyshabur “Understanding the Failure Modes of Out-of-Distribution Generalization” In CoRR abs/2010.15775, 2020 arXiv: https://arxiv.org/abs/2010.15775
  17. “Learning explanations that are hard to vary”, 2020 arXiv:2009.00329 [cs.LG]
  18. “Scikit-learn: Machine Learning in Python” In Journal of Machine Learning Research 12, 2011, pp. 2825–2830
  19. “Decision Trees” In Data Mining and Knowledge Discovery Handbook Boston, MA: Springer US, 2005, pp. 165–192 DOI: 10.1007/0-387-25465-X˙9
  20. V. Vapnik “Principles of Risk Minimization for Learning Theory” In Advances in Neural Information Processing Systems 4 Morgan-Kaufmann, 1991

Summary

We haven't generated a summary for this paper yet.