DTW+S: Shape-based Comparison of Time-series with Ordered Local Trend (2309.03579v3)
Abstract: Measuring distance or similarity between time-series data is a fundamental aspect of many applications including classification, clustering, and ensembling/alignment. Existing measures may fail to capture similarities among local trends (shapes) and may even produce misleading results. Our goal is to develop a measure that looks for similar trends occurring around similar times and is easily interpretable for researchers in applied domains. This is particularly useful for applications where time-series have a sequence of meaningful local trends that are ordered, such as in epidemics (a surge to an increase to a peak to a decrease). We propose a novel measure, DTW+S, which creates an interpretable "closeness-preserving" matrix representation of the time-series, where each column represents local trends, and then it applies Dynamic Time Warping to compute distances between these matrices. We present a theoretical analysis that supports the choice of this representation. We demonstrate the utility of DTW+S in several tasks. For the clustering of epidemic curves, we show that DTW+S is the only measure able to produce good clustering compared to the baselines. For ensemble building, we propose a combination of DTW+S and barycenter averaging that results in the best preservation of characteristics of the underlying trajectories. We also demonstrate that our approach results in better classification compared to Dynamic Time Warping for a class of datasets, particularly when local trends rather than scale play a decisive role.
- M. Müller, “Dynamic time warping,” Information retrieval for music and motion, pp. 69–84, 2007.
- US SMH, “COVID-19 Scenario Modeling Hub.” https://github.com/midas-network/covid19-scenario-modeling-hub, 2020.
- US SMH, “Flu Scenario Modeling Hub.” https://fluscenariomodelinghub.org/, 2022.
- European CDC, “European COVID-19 Scenario Hub.” https://github.com/covid19-forecast-hub-europe/covid19-scenario-hub-europe, 2022.
- F. Petitjean, G. Forestier, G. I. Webb, A. E. Nicholson, Y. Chen, and E. Keogh, “Dynamic time warping averaging of time series allows faster and more accurate classification,” in 2014 IEEE international conference on data mining, pp. 470–479, IEEE, 2014.
- A. Srivastava, S. Singh, and F. Lee, “Shape-based evaluation of epidemic forecasts,” in 2022 IEEE International Conference on Big Data (Big Data), pp. 1701–1710, IEEE, 2022.
- F. Petitjean, A. Ketterlin, and P. Gançarski, “A global averaging method for dynamic time warping, with applications to clustering,” Pattern recognition, vol. 44, no. 3, pp. 678–693, 2011.
- C. A. Ratanamahatana and E. Keogh, “Making time-series classification more accurate using learned constraints,” in Proceedings of the 2004 SIAM international conference on data mining, pp. 11–22, SIAM, 2004.
- L. Ye and E. Keogh, “Time series shapelets: a new primitive for data mining,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 947–956, 2009.
- Y.-S. Jeong and R. Jayaraman, “Support vector-based algorithms with weighted dynamic time warping kernel function for time series classification,” Knowledge-based systems, vol. 75, pp. 184–191, 2015.
- J. Lines and A. Bagnall, “Time series classification with ensembles of elastic distance measures,” Data Mining and Knowledge Discovery, vol. 29, pp. 565–592, 2015.
- E. Dhamo, N. Ismailaja, and E. Kalluçi, “Comparing the efficiency of cid distance and cort coefficient for finding similar subsequences in time series,” in Sixth International Conference ISTI, pp. 5–6, 2015.
- J. Zhao and L. Itti, “shapedtw: Shape dynamic time warping,” Pattern Recognition, vol. 74, pp. 171–184, 2018.
- P. J. Van Fleet, Discrete wavelet transformations: An elementary approach with applications. John Wiley & Sons, 2019.
- E. Howerton, L. Contamin, L. C. Mullany, M. Qin, N. G. Reich, S. Bents, R. K. Borchering, S.-m. Jung, S. L. Loo, C. P. Smith, et al., “Informing pandemic response in the face of uncertainty. an evaluation of the us covid-19 scenario modeling hub,” medRxiv, 2023.
- R. Borchering, “Flusight 2023-2024.” https://github.com/cdcepi/FluSight-forecast-hub, 2023.
- J. Divasón and J. Aransay, “Rank-nullity theorem in linear algebra,” Archive of Formal Proofs, 2013.
- W. H. Day and H. Edelsbrunner, “Efficient algorithms for agglomerative hierarchical clustering methods,” Journal of classification, vol. 1, no. 1, pp. 7–24, 1984.
- H. B. Zhou and J. T. Gao, “Automatic method for determining cluster number based on silhouette coefficient,” in Advanced materials research, vol. 951, pp. 227–230, Trans Tech Publ, 2014.
- A. Mueen, E. Keogh, and N. Young, “Logical-shapelets: an expressive primitive for time series classification,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1154–1162, 2011.
- L. Gupta, D. L. Molfese, R. Tammana, and P. G. Simos, “Nonlinear alignment and averaging for estimating the evoked potential,” IEEE transactions on biomedical engineering, vol. 43, no. 4, pp. 348–356, 1996.
- Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista, “The ucr time series classification archive,” July 2015. www.cs.ucr.edu/~eamonn/time_series_data/.
- E. J. Keogh and M. J. Pazzani, “Scaling up dynamic time warping for datamining applications,” in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 285–289, 2000.