Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A review on distance based time series classification (1806.04509v1)

Published 12 Jun 2018 in stat.ML and cs.LG

Abstract: Time series classification is an increasing research topic due to the vast amount of time series data that are being created over a wide variety of fields. The particularity of the data makes it a challenging task and different approaches have been taken, including the distance based approach. 1-NN has been a widely used method within distance based time series classification due to it simplicity but still good performance. However, its supremacy may be attributed to being able to use specific distances for time series within the classification process and not to the classifier itself. With the aim of exploiting these distances within more complex classifiers, new approaches have arisen in the past few years that are competitive or which outperform the 1-NN based approaches. In some cases, these new methods use the distance measure to transform the series into feature vectors, bridging the gap between time series and traditional classifiers. In other cases, the distances are employed to obtain a time series kernel and enable the use of kernel methods for time series classification. One of the main challenges is that a kernel function must be positive semi-definite, a matter that is also addressed within this review. The presented review includes a taxonomy of all those methods that aim to classify time series using a distance based approach, as well as a discussion of the strengths and weaknesses of each method.

Citations (230)

Summary

  • The paper presents a comprehensive review of distance-based time series classification methods, highlighting k-NN with DTW and its effectiveness.
  • It details how transforming time series into distance features, including global, local, and embedded approaches, addresses classification challenges.
  • The study compares indefinite and definite distance kernels, discussing their trade-offs and suggesting future research directions to enhance efficiency.

Time Series Classification Using Distance-Based Approaches: An Expert Overview

The paper "A Review on Distance Based Time Series Classification" provides a comprehensive overview of methods for classifying time series data based on distance metrics. The paper acknowledges the increasing volume of time series data across various domains and highlights the unique challenges posed by the temporal nature of such data. To this end, the authors categorize time series classification methods into three primary approaches: feature-based, model-based, and distance-based, with a keen focus on the latter.

Distance-Based Time Series Classification (TSC)

The paper primarily discusses the distance-based approach, which includes methods that utilize (dis)similarity measures as a key component of classification. Within this approach, the paper identifies three distinct strategies:

  1. k-Nearest Neighbour (k-NN): The simplicity and effectiveness of the 1-NN classifier, especially when coupled with sophisticated distance metrics like Dynamic Time Warping (DTW), are highlighted. Despite its limitations, such as sensitivity to noise and computational cost, 1-NN remains a robust baseline for TSC tasks.
  2. Distance Features: This section is subdivided into three categories:
    • Global Distance Features: Methods that transform each time series into a feature vector by calculating distances to other series in the dataset. Approaches like those using DTW distance as features with SVMs have shown competitive results.
    • Local Distance Features: Leveraging shapelets, which are subsequences that capture localized patterns unique to each class, the methods transform series based on distances to these shapelets, thus enhancing interpretability and accuracy.
    • Embedded Features: These methods involve embedding time series into Euclidean space to retain distance relationships, thereby facilitating the use of conventional classifiers. However, issues like handling new test instances consistently still pose challenges.
  3. Distance Kernels: The paper explores kernel-based methods, examining both indefinite and definite kernels.
    • Indefinite Distance Kernels: These leverage powerful techniques such as the Gaussian Distance Substitution kernel but often suffer from the mathematical disadvantage of being non-PSD (Positive Semi-Definite). Solutions such as potential support vector machines (P-SVMs) are discussed, though they require substantial computational resources.
    • Definite Distance Kernels: By replacing non-linear operators like min or max with summation techniques, definite kernels are derived. These methods have been shown to offer stability in classification while preserving the temporal aspects inherent to time series.

Implications and Future Work

The research underscores the significant role of time series distances in classification accuracy, while also acknowledging the constraints of computational efficiency and scalability. Among the paper's contributions is an in-depth comparison of distance features versus distance kernels and an exploration of the trade-offs between simplicity and performance in using these approaches.

For practitioners, the selection of an appropriate distance measure is critical and domain-specific, suggesting a need for tailored solutions that may include hybrid approaches combining the benefits of feature-based and kernel-based methods.

Theoretically, the paper hints at the potential for developing novel kernels that delve into the structural properties of time series, such as alignment, without sacrificing definiteness. Further research around kernel regularization and prototype selection could alleviate some of the efficiency issues notably present in distance feature methods.

In conclusion, while the paper refrains from declaring any single method superior, it provides a critical examination of existing techniques and points to areas ripe for innovation, thereby serving as a valuable resource for researchers aiming to advance the field of time series classification.