Time Series Clustering With Random Convolutional Kernels (2305.10457v2)
Abstract: Time series data, spanning applications ranging from climatology to finance to healthcare, presents significant challenges in data mining due to its size and complexity. One open issue lies in time series clustering, which is crucial for processing large volumes of unlabeled time series data and unlocking valuable insights. Traditional and modern analysis methods, however, often struggle with these complexities. To address these limitations, we introduce R-Clustering, a novel method that utilizes convolutional architectures with randomly selected parameters. Through extensive evaluations, R-Clustering demonstrates superior performance over existing methods in terms of clustering accuracy, computational efficiency and scalability. Empirical results obtained using the UCR archive demonstrate the effectiveness of our approach across diverse time series datasets. The findings highlight the significance of R-Clustering in various domains and applications, contributing to the advancement of time series data mining.
- S. Aghabozorgi, A. S. Shirkhorshidi, and T. Y. Wah, “Time-series clustering–a decade review,” Information Systems, vol. 53, pp. 16–38, 2015.
- Q. Yang and X. Wu, “10 challenging problems in data mining research,” International Journal of Information Technology & Decision Making, vol. 5, no. 04, pp. 597–604, 2006.
- R. P. Kumar and P. Nagabhushan, “Time series as a point-a novel approach for time series cluster visualization.” in DMIN. Citeseer, 2006, pp. 24–29.
- A. Lakhina, M. Crovella, and C. Diot, “Mining anomalies using traffic feature distributions,” ACM SIGCOMM computer communication review, vol. 35, no. 4, pp. 217–228, 2005.
- I. C. McDowell, D. Manandhar, C. M. Vockley, A. K. Schmid, T. E. Reddy, and B. E. Engelhardt, “Clustering gene expression time series data using an infinite gaussian process mixture model,” PLoS computational biology, vol. 14, no. 1, p. e1005896, 2018.
- M. Längkvist, L. Karlsson, and A. Loutfi, “A review of unsupervised feature learning and deep learning for time-series modeling,” Pattern Recognition Letters, vol. 42, pp. 11–24, 2014.
- J. Paparrizos and L. Gravano, “k-shape: Efficient and accurate clustering of time series,” in Proceedings of the 2015 ACM SIGMOD international conference on management of data, 2015, pp. 1855–1870.
- J. MacQueen, “Classification and analysis of multivariate observations,” in 5th Berkeley Symp. Math. Statist. Probability, 1967, pp. 281–297.
- D. J. Berndt and J. Clifford, “Using dynamic time warping to find patterns in time series.” in KDD workshop, vol. 10, no. 16. Seattle, WA, USA:, 1994, pp. 359–370.
- A. Likas, N. Vlassis, and J. J. Verbeek, “The global k-means clustering algorithm,” Pattern recognition, vol. 36, no. 2, pp. 451–461, 2003.
- A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM computing surveys (CSUR), vol. 31, no. 3, pp. 264–323, 1999.
- Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
- F. Sebastiani, “Machine learning in automated text categorization,” ACM computing surveys (CSUR), vol. 34, no. 1, pp. 1–47, 2002.
- D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the seventh IEEE international conference on computer vision, vol. 2. Ieee, 1999, pp. 1150–1157.
- J. G. Proakis and D. G. Manolakis, “Digital signal processing: principles, algorithms, and applications,” Digital signal processing: principles, 1996.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
- K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, “Lstm: A search space odyssey,” IEEE transactions on neural networks and learning systems, vol. 28, no. 10, pp. 2222–2232, 2016.
- H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller, “Deep learning for time series classification: a review,” Data mining and knowledge discovery, vol. 33, no. 4, pp. 917–963, 2019.
- Q. Ma, J. Zheng, S. Li, and G. W. Cottrell, “Learning representations for time series clustering,” Advances in neural information processing systems, vol. 32, 2019.
- D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, and J. Schmidhuber, “Flexible, high performance convolutional neural networks for image classification,” in Twenty-second international joint conference on artificial intelligence, 2011.
- S. Pereira, A. Pinto, V. Alves, and C. A. Silva, “Brain tumor segmentation using convolutional neural networks in mri images,” IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1240–1251, 2016.
- M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 132–149.
- J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” in International conference on machine learning. PMLR, 2016, pp. 478–487.
- Z. Wang, W. Yan, and T. Oates, “Time series classification from scratch with deep neural networks: A strong baseline,” in 2017 International joint conference on neural networks (IJCNN). IEEE, 2017, pp. 1578–1585.
- B. Zhao, H. Lu, S. Chen, J. Liu, and D. Wu, “Convolutional neural networks for time series classification,” Journal of Systems Engineering and Electronics, vol. 28, no. 1, pp. 162–169, 2017.
- G.-B. Huang, “An insight into extreme learning machines: random neurons, random features and kernels,” Cognitive Computation, vol. 6, no. 3, pp. 376–390, 2014.
- K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, “What is the best multi-stage architecture for object recognition?” in 2009 IEEE 12th international conference on computer vision. IEEE, 2009, pp. 2146–2153.
- A. M. Saxe, P. W. Koh, Z. Chen, M. Bhand, B. Suresh, and A. Y. Ng, “On random weights and unsupervised feature learning,” in Proceedings of the 28th International Conference on International Conference on Machine Learning, 2011, pp. 1089–1096.
- A. Dempster, F. Petitjean, and G. I. Webb, “Rocket: exceptionally fast and accurate time series classification using random convolutional kernels,” Data Mining and Knowledge Discovery, vol. 34, no. 5, pp. 1454–1495, 2020.
- A. Dempster, D. F. Schmidt, and G. I. Webb, “Minirocket: A very fast (almost) deterministic transform for time series classification,” in Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, 2021, pp. 248–257.
- H. A. Dau, A. Bagnall, K. Kamgar, C.-C. M. Yeh, Y. Zhu, S. Gharghabi, C. A. Ratanamahatana, and E. Keogh, “The ucr time series archive,” IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 6, pp. 1293–1305, 2019.
- K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is “nearest neighbor” meaningful?” in Database Theory—ICDT’99: 7th International Conference Jerusalem, Israel, January 10–12, 1999 Proceedings 7. Springer, 1999, pp. 217–235.
- C. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the surprising behavior of distance metrics in high dimensional space,” in Database Theory—ICDT 2001: 8th International Conference London, UK, January 4–6, 2001 Proceedings 8. Springer, 2001, pp. 420–434.
- T. Minka, “Automatic choice of dimensionality for pca,” Advances in neural information processing systems, vol. 13, 2000.
- A. Javed, B. S. Lee, and D. M. Rizzo, “A benchmark study on time series clustering,” Machine Learning with Applications, vol. 1, p. 100001, 2020.
- W. M. Rand, “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical association, vol. 66, no. 336, pp. 846–850, 1971.
- N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for clusterings comparison: is a correction for chance necessary?” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 1073–1080.
- L. Hubert and P. Arabie, “Comparing partitions,” Journal of classification, vol. 2, no. 1, pp. 193–218, 1985.
- D. Steinley, “Properties of the hubert-arable adjusted rand index.” Psychological methods, vol. 9, no. 3, p. 386, 2004.
- J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” The Journal of Machine learning research, vol. 7, pp. 1–30, 2006.
- S. Garcia and F. Herrera, “An extension on” statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons.” Journal of machine learning research, vol. 9, no. 12, 2008.
- A. Benavoli, G. Corani, and F. Mangili, “Should we really use post-hoc tests based on mean-ranks?” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 152–161, 2016.
- M. Löning, A. Bagnall, S. Ganesh, V. Kazakov, J. Lines, and F. J. Király, “sktime: A unified interface for machine learning with time series,” arXiv preprint arXiv:1909.07872, 2019.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
- R. L. Thorndike, “Who belongs in the family,” in Psychometrika. Citeseer, 1953.
- M. Friedman, “The use of ranks to avoid the assumption of normality implicit in the analysis of variance,” Journal of the american statistical association, vol. 32, no. 200, pp. 675–701, 1937.
- M. J. De Hoon, S. Imoto, J. Nolan, and S. Miyano, “Open source clustering software,” Bioinformatics, vol. 20, no. 9, pp. 1453–1454, 2004.