Semi-supervised Predictive Clustering Trees for (Hierarchical) Multi-label Classification (2207.09237v2)
Abstract: Semi-supervised learning (SSL) is a common approach to learning predictive models using not only labeled examples, but also unlabeled examples. While SSL for the simple tasks of classification and regression has received a lot of attention from the research community, this is not properly investigated for complex prediction tasks with structurally dependent variables. This is the case of multi-label classification and hierarchical multi-label classification tasks, which may require additional information, possibly coming from the underlying distribution in the descriptive space provided by unlabeled examples, to better face the challenging task of predicting simultaneously multiple class labels. In this paper, we investigate this aspect and propose a (hierarchical) multi-label classification method based on semi-supervised learning of predictive clustering trees. We also extend the method towards ensemble learning and propose a method based on the random forest approach. Extensive experimental evaluation conducted on 23 datasets shows significant advantages of the proposed method and its extension with respect to their supervised counterparts. Moreover, the method preserves interpretability and reduces the time complexity of classical tree-based models.
- An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach. Learn. 36, 1 (1999), 105–139.
- Top-down induction of clustering trees. In Proceeding of the 15th International Conference on Machine learning. Morgan Kaufmann, San Francisco, CA, 1998, pp. 55–63.
- Learning multi-label scene classification. Pattern Recognit. 37, 9 (2004), 1757–1771.
- Breiman, L. Random forests. Machine Learning 45, 1 (2001), 5–32.
- Classification and Regression Trees. Wadsworth & Brooks, Monterey, CA., 1984.
- The 9th annual MLSP competition: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment. In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (2013), IEEE, pp. 1–8.
- Semi-supervised Learning. MIT Press, 2006.
- Learning from labeled and unlabeled data: An empirical study across techniques and domains. Journal of Artificial Intelligence Research 23, 1 (2005), 331–366.
- Clare, A. Machine Learning and Data Mining for Yeast Functional Genomics. PhD thesis, University of Wales Aberystwyth, Aberystwyth, United Kingdom, 2003.
- Unlabeled data can degrade classification performance of generative classifiers. In Proceedings of the 15th International Florida Artificial Intelligence Research Society Conference (2002), AAAI, Palo Alto, California, pp. 327–331.
- k-nearest neighbour classifiers. Multiple Classifier Systems 34 (2007), 1–17.
- Demsar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 (2006), 1–30.
- Modelling pollen dispersal of genetically modified oilseed rape within the field. In Proceedings of the Annual Meeting of the Ecological Society of America (2005), p. 152.
- Using multi-objective classification to model communities of soil microarthopods. Ecological Modelling 191, 1 (2006), 131–143.
- Hierarchical classification of diatom images using ensembles of predictive clustering trees. Ecol. Informatics 7, 1 (2012), 19–29.
- Hierarchical annotation of medical images. Pattern Recognit. 44, 10-11 (2011), 2436–2449.
- Protein classification with multiple algorithms. In Advances in Informatics, 10th Panhellenic Conference on Informatics, PCI 2005, Volos, Greece, November 11-13, 2005, Proceedings (2005), P. Bozanis and E. N. Houstis, Eds., vol. 3746 of Lecture Notes in Computer Science, Springer, pp. 448–456.
- Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. Springer, Berlin, 2002, pp. 97–112.
- Predicting chemical parameters of river water quality from bioindicator data. Appl. Intell. 13, 1 (2000), 7–17.
- A kernel method for multi-labelled classification. In Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3-8, 2001, Vancouver, British Columbia, Canada] (2001), T. G. Dietterich, S. Becker, and Z. Ghahramani, Eds., MIT Press, pp. 681–687.
- An extensive empirical study on semi-supervised learning. In ICDM 2010, The 10th IEEE International Conference on Data Mining, Sydney, Australia, 14-17 December 2010 (2010), G. I. Webb, B. Liu, C. Zhang, D. Gunopulos, and X. Wu, Eds., IEEE Computer Society, pp. 186–195.
- Semi-supervised multi-label classification - A simultaneous large-margin, subspace learning approach. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part II (2012), P. A. Flach, T. D. Bie, and N. Cristianini, Eds., vol. 7524 of Lecture Notes in Computer Science, Springer, pp. 355–370.
- Consistency-based semi-supervised learning for object detection. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada (2019), H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., pp. 10758–10767.
- Multilabel text classification for automated tag suggestion. In Proceedings of the ECML/PKDD 2008 Discovery Challenge (2008), vol. 75.
- The Enron Corpus: A New Dataset for Email Classification Research, vol. 3201 of Lecture Notes in Computer Science. Springer, Berlin, 2004, pp. 217–226.
- Tree ensembles for predicting structured outputs. Pattern Recognit. 46, 3 (2013), 817–833.
- Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25, 3 (2013), 704–719.
- Self-training for multi-target regression with tree ensembles. Knowl. Based Syst. 123 (2017), 41–60.
- Semi-supervised predictive clustering trees for (hierarchical) multi-label classification. arXiv preprint arXiv:2207.09237 (2022).
- Semi-supervised regression trees with application to QSAR modelling. Expert Syst. Appl. 158 (2020), 113569.
- Semi-supervised trees for multi-target regression. Inf. Sci. 450 (2018), 109–127.
- Semi-supervised multi-label classification using an extended graph-based manifold regularization. Complex & Intelligent Systems 8, 3 (2022), 1561–1577.
- Privacy-preserving point-of-interest recommendation based on simplified graph convolutional network for geological traveling. ACM Trans. Intell. Syst. Technol. (Sep 2023).
- Semi-supervised spam filtering using aggressive consistency learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (New York, NY, USA, 2010), SIGIR ’10, Association for Computing Machinery, p. 751–752.
- Text classification from labeled and unlabeled documents using EM. Machine learning 39, 2-3 (2000), 103–134.
- Feature ranking for semi-supervised learning. Machine Learning (2022), 1–30.
- Clusplus: A decision tree-based framework for predicting structured outputs. SoftwareX 24 (2023), 101526.
- Quinlan, R. J. C4.5: Programs for Machine Learning, 1 ed. Morgan Kaufmann, 1993.
- Classifier chains for multi-label classification. Mach. Learn. 85, 3 (2011), 333–359.
- Applying semi-supervised learning in hierarchical multi-label classification. Expert Syst. Appl. 41, 14 (2014), 6075–6085.
- Semi-supervised multi-label learning from crowds via deep sequential generative model. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020 (2020), R. Gupta, Y. Liu, J. Tang, and B. A. Prakash, Eds., ACM, pp. 1141–1149.
- Insights offered by data-mining when analyzing media space data. Informatica (Slovenia) 25, 3 (2001), 357–363.
- Semi-supervised multi-label learning for graph-structured data. In CIKM ’21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021 (2021), G. Demartini, G. Zuccon, J. S. Culpepper, Z. Huang, and H. Tong, Eds., ACM, pp. 1723–1733.
- Multi-label classification of music into emotions. In Proceedings of the 9th International Conference on Music Information Retrieval (2008), vol. 8, Drexel University, Philadelphia, PA, pp. 325–330.
- A survey on semi-supervised learning. Mach. Learn. 109, 2 (2020), 373–440.
- Decision trees for hierarchical multi-label classification. Mach. Learn. 73, 2 (2008), 185–214.
- Dual relation semi-supervised multi-label learning. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 (2020), AAAI Press, pp. 6227–6234.
- Wilcoxon, F. Individual comparisons by ranking methods. Biometrics Bulletin 1 (1945), 80–83.
- Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005.
- Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion. Computer Speech & Language 24, 3 (2010), 433–444.
- Semi-supervised multi-label learning with incomplete labels. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015 (2015), Q. Yang and M. J. Wooldridge, Eds., AAAI Press, pp. 4062–4068.
- Semisupervised regression with cotraining-style algorithms. IEEE Trans. Knowl. Data Eng. 19, 11 (2007), 1479–1493.
- Ženko, B. Learning Predictive Clustering Rules. Ph.D. Thesis, Faculty of Computer Science, University of Ljubljana, 2007.
- Jurica Levatić (2 papers)
- Michelangelo Ceci (3 papers)
- Dragi Kocev (16 papers)
- Sašo Džeroski (32 papers)