Exploring Loss Design Techniques For Decision Tree Robustness To Label Noise (2405.17672v1)
Abstract: In the real world, data is often noisy, affecting not only the quality of features but also the accuracy of labels. Current research on mitigating label errors stems primarily from advances in deep learning, and a gap exists in exploring interpretable models, particularly those rooted in decision trees. In this study, we investigate whether ideas from deep learning loss design can be applied to improve the robustness of decision trees. In particular, we show that loss correction and symmetric losses, both standard approaches, are not effective. We argue that other directions need to be explored to improve the robustness of decision trees to label noise.
- On symmetric losses for learning from corrupted labels. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 961–970. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/charoenphakdee19a.html.
- A comprehensive introduction to label noise. In The European Symposium on Artificial Neural Networks, 2014. URL https://api.semanticscholar.org/CorpusID:17187125.
- On the Robustness of Decision Tree Learning Under Label Noise. In Jinho Kim, Kyuseok Shim, Longbing Cao, Jae-Gil Lee, Xuemin Lin, and Yang-Sae Moon (eds.), Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, pp. 685–697, Cham, 2017. Springer International Publishing. ISBN 978-3-319-57454-7. doi: 10.1007/978-3-319-57454-7˙53.
- A Survey of Label-noise Representation Learning: Past, Present and Future, February 2021. URL http://arxiv.org/abs/2011.04406. arXiv:2011.04406 [cs].
- Rboost: Label noise-robust boosting algorithm based on a nonconvex loss function and the numerically stable base learners. IEEE Transactions on Neural Networks and Learning Systems, 27(11):2216–2228, 2016. doi: 10.1109/TNNLS.2015.2475750.
- Constraint Enforcement on Decision Trees: A Survey. ACM Computing Surveys, 54(10s):1–36, January 2022. ISSN 0360-0300, 1557-7341. doi: 10.1145/3506734. URL https://dl.acm.org/doi/10.1145/3506734.
- Pervasive label errors in test sets destabilize machine learning benchmarks. In Proceedings of the 35th Conference on Neural Information Processing Systems Track on Datasets and Benchmarks, December 2021.
- Making deep neural networks robust to label noise: A loss correction approach. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2233–2241, 2017. doi: 10.1109/CVPR.2017.240.
- Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems, 34(11):8135–8153, 2023. doi: 10.1109/TNNLS.2022.3152527.
- OpenML: networked science in machine learning. 15(2):49–60, 2014. ISSN 1931-0145. doi: 10.1145/2641190.2641198. URL https://doi.org/10.1145/2641190.2641198.
- On the Robust Splitting Criterion of Random Forest. In 2019 IEEE International Conference on Data Mining (ICDM), pp. 1420–1425, November 2019. doi: 10.1109/ICDM.2019.00184. ISSN: 2374-8486.
- Improving robustness of random forest under label noise. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 950–958, 2019. doi: 10.1109/WACV.2019.00106.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.