Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Empirical Evaluation of Similarity Measures for Time Series Classification (1401.3973v1)

Published 16 Jan 2014 in cs.LG, cs.CV, and stat.ML

Abstract: Time series are ubiquitous, and a measure to assess their similarity is a core part of many computational systems. In particular, the similarity measure is the most essential ingredient of time series clustering and classification systems. Because of this importance, countless approaches to estimate time series similarity have been proposed. However, there is a lack of comparative studies using empirical, rigorous, quantitative, and large-scale assessment strategies. In this article, we provide an extensive evaluation of similarity measures for time series classification following the aforementioned principles. We consider 7 different measures coming from alternative measure `families', and 45 publicly-available time series data sets coming from a wide variety of scientific domains. We focus on out-of-sample classification accuracy, but in-sample accuracies and parameter choices are also discussed. Our work is based on rigorous evaluation methodologies and includes the use of powerful statistical significance tests to derive meaningful conclusions. The obtained results show the equivalence, in terms of accuracy, of a number of measures, but with one single candidate outperforming the rest. Such findings, together with the followed methodology, invite researchers on the field to adopt a more consistent evaluation criteria and a more informed decision regarding the baseline measures to which new developments should be compared.

Citations (207)

Summary

  • The paper provides an extensive empirical evaluation of seven similarity measures for classifying diverse time series data.
  • It uses a one-nearest neighbor classifier with 3-fold cross-validation and robust statistical tests to assess classification accuracy.
  • Results show that Time-Warped Edit Distance (TWED) often outperforms other measures, highlighting the importance of context-driven similarity selection.

An Empirical Evaluation of Similarity Measures for Time Series Classification

The paper "An Empirical Evaluation of Similarity Measures for Time Series Classification" provides an extensive analysis of various similarity measures used in time series classification tasks. Time series data is prevalent across numerous scientific fields, necessitating accurate and efficient similarity measures for effective classification and clustering. This paper evaluates seven distinct similarity measures across 45 publicly-available time series datasets from diverse scientific domains.

The core contribution of the paper is the empirical evaluation following rigorous methodologies, including out-of-sample classification accuracy, and robust statistical significance testing. This systematic approach addresses the existing gap of comparative studies with quantitative and large-scale assessment strategies. Beyond assessing the accuracy of the seven measures, the paper also provides insights into parameter choices and potential biases in existing evaluation methodologies.

Similarity Measures Evaluated

The paper evaluates the following similarity measures:

  • Euclidean Distance: A classic lock-step measure operating on aligned samples.
  • Fourier Coefficients (FC): Evaluates similarity based on features extracted through the Fourier transform.
  • Auto-regressive Models (AR): Utilizes model parameters for assessing time series similarity.
  • Dynamic Time Warping (DTW): An elastic measure allowing non-linear alignments between time series.
  • Edit Distance on Real Sequences (EDR): An edit-distance-based measure extended to real-valued sequences.
  • Time-Warped Edit Distance (TWED): A hybrid approach incorporating both alignment and edit operations.
  • Minimum Jump Costs Dissimilarity (MJC): An elastic measure based on iterative cost minimization of sample jumps between time series.

Methodological Rigor

The authors employ a one-nearest neighbor (1NN) classifier to reflect the efficacy of each measure, leveraging a 3-fold cross-validation to validate classification accuracy. The statistical significance of differences in error ratios is assessed using the Wilcoxon signed-rank test, adjusted for multiple comparisons via the Holm-Bonferroni method. This methodological approach ensures the robustness of the conclusions drawn from empirical findings.

Key Findings

The evaluation reveals no single measure consistently outperforms all others across all datasets. However, Time-Warped Edit Distance (TWED) emerges as statistically superior in several cases, suggesting it should be a benchmark for new similarity measures in time series classification. Interestingly, although Dynamic Time Warping (DTW) has long been a benchmark, it, along with EDR and MJC, constitutes a cluster of similarly performing measures for a broad set of datasets.

Implications and Future Directions

The results suggest that future research into time series similarity measures should include TWED as a baseline for comparison. The paper indicates some datasets might benefit from specific measures, highlighting the need for context-driven selection of similarity measures based on data characteristics.

The paper discusses the potential need to reevaluate the parameter range for some measures, such as TWED, where optimal choices lie at the parameter range boundaries. Practically, the insights from this work could inform the development of time series classification systems, improving their efficacy by selecting the appropriate similarity measure for given datasets.

Looking toward future research, the paper suggests investigating post-processing steps that could improve similarity assessments beyond current methodologies. Additionally, advancements could focus on developing robust measure evaluation frameworks that further mature the field.

In conclusion, the comprehensive empirical analysis presented in this paper not only benchmarks existing approaches but also encourages a standardized evaluation methodology that could guide future developments in time series classification.