Generating Hierarchical Structures for Improved Time Series Classification Using Stochastic Splitting Functions (2309.11963v1)
Abstract: This study introduces a novel hierarchical divisive clustering approach with stochastic splitting functions (SSFs) to enhance classification performance in multi-class datasets through hierarchical classification (HC). The method has the unique capability of generating hierarchy without requiring explicit information, making it suitable for datasets lacking prior knowledge of hierarchy. By systematically dividing classes into two subsets based on their discriminability according to the classifier, the proposed approach constructs a binary tree representation of hierarchical classes. The approach is evaluated on 46 multi-class time series datasets using popular classifiers (svm and rocket) and SSFs (potr, srtr, and lsoo). The results reveal that the approach significantly improves classification performance in approximately half and a third of the datasets when using rocket and svm as the classifier, respectively. The study also explores the relationship between dataset features and HC performance. While the number of classes and flat classification (FC) score show consistent significance, variations are observed with different splitting functions. Overall, the proposed approach presents a promising strategy for enhancing classification by generating hierarchical structure in multi-class time series datasets. Future research directions involve exploring different splitting functions, classifiers, and hierarchy structures, as well as applying the approach to diverse domains beyond time series data. The source code is made openly available to facilitate reproducibility and further exploration of the method.
- The enron corpus: A new dataset for email classification research. In Machine Learning: ECML 2004: 15th European Conference on Machine Learning, Pisa, Italy, September 20-24, 2004. Proceedings 15, pages 217–226. Springer, 2004.
- Automatically learning document taxonomies for hierarchical classification. In Special interest tracks and posters of the 14th international conference on World Wide Web, pages 1010–1011, 2005.
- Multilabel machine learning and its application to semantic scene classification. In Storage and Retrieval Methods and Applications for Multimedia 2004, volume 5307, pages 188–199. SPIE, 2003.
- Hierarchical annotation of medical images. Pattern Recognition, 44(10-11):2436–2449, 2011.
- Hierarchical classification of diatom images using ensembles of predictive clustering trees. Ecological Informatics, 7(1):19–29, 2012.
- Amanda Clare. Machine learning and data mining for yeast functional genomics. PhD thesis, Citeseer, 2003.
- Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7):830–836, 2006.
- A global-model naive bayes approach to the hierarchical prediction of protein functions. In 2009 Ninth IEEE International Conference on Data Mining, pages 992–997. IEEE, 2009.
- Hierarchical cost-sensitive algorithms for genome-wide gene function prediction. In Machine learning in systems biology, pages 14–29. PMLR, 2009.
- Automatic construction of n-ary tree based taxonomies. In Sixth IEEE International Conference on Data Mining-Workshops (ICDMW’06), pages 75–79. IEEE, 2006.
- Inducing a hierarchy for multi-class classification problems. arXiv preprint arXiv:2102.10263, 2021.
- The ucr time series archive. IEEE/CAA Journal of Automatica Sinica, 6(6):1293–1305, 2019.
- Hierarchical text classification and evaluation. In Proceedings 2001 IEEE International Conference on Data Mining, pages 521–528. IEEE, 2001.
- A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22:31–72, 2011.
- E Knuth Donald et al. The art of computer programming. Sorting and searching, 3(426-458):4, 1999.
- Rocket: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery, 34(5):1454–1495, 2020.
- Support-vector networks. Machine learning, 20:273–297, 1995.
- Mervyn Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the royal statistical society: Series B (Methodological), 36(2):111–133, 1974.
- Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Systems with Applications, 182:115222, 2021.
- sktime: A unified interface for machine learning with time series. arXiv preprint arXiv:1909.07872, 2019.