Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

catch22: CAnonical Time-series CHaracteristics (1901.10200v2)

Published 29 Jan 2019 in cs.IR, cs.LG, and stat.ML

Abstract: Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a generically useful set of 22 CAnonical Time-series CHaracteristics, catch22. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.

Citations (227)

Summary

  • The paper introduces catch22, a methodology that condenses 4791 features into 22 informative time-series characteristics.
  • It achieves approximately 92.5% of the full library's classification accuracy while significantly reducing computational overhead.
  • The features are implemented in C with Python, R, and MATLAB wrappers, offering robust, interpretable insights across diverse domains.

Overview of "catch22: CAnonical Time-series CHaracteristics"

The paper "catch22: CAnonical Time-series CHaracteristics," presents a methodological advancement in the domain of time-series analysis, specifically targeting the efficient extraction and utilization of time-series features for classification tasks. The authors develop and evaluate a reduced set of highly informative features, termed "catch22," which are derived using a rigorous data-driven process applied to a diverse set of time-series classification problems.

Key Contributions

  1. Feature Selection Methodology: The paper introduces a systematic approach to select a minimal, yet highly informative, set of time-series features from an extensive library. The selection process incorporates both performance-based filtering and redundancy minimization, ensuring the retained features are both effective and diverse in capturing time-series dynamics.
  2. catch22 Feature Set: The authors propose a set of 22 features, known as "catch22," that are computationally efficient and retain approximately 92.5% of the classification accuracy achievable by the full set of 4791 features. This selection represents a substantial dimensionality reduction, leading to significant computational benefits without a severe compromise in accuracy.
  3. Interdisciplinary Feature Representation: The catch22 features encompass a broad range of time-series characteristics including linear and non-linear autocorrelation, distribution properties, and fluctuation dynamics. This diversity makes the feature set generally applicable across various domains like finance, medicine, and industrial monitoring.
  4. Implementation and Accessibility: The feature set is implemented in C with wrappers in Python, R, and MATLAB, facilitating ease of integration into existing workflows. The near-linear computational complexity (O(N1.16)\mathcal{O}(N^{1.16})) with respect to time-series length underscores its practicality for analyzing large and complex datasets.

Experimental Validation

The catch22 features were evaluated on a benchmark comprising 93 datasets from the UEA/UCR time-series classification repository. The authors found that these features maintained robust performance across this diverse collection, offering nearly similar classification accuracy to the full library at a fraction of the computational cost. In comparison with traditional shape-based classifiers, such as dynamic time warping (DTW), catch22 exhibited competitive performance, often excelling on datasets where class distinctions were based on characteristic dynamical properties rather than shape.

Implications and Future Work

The introduction of catch22 exemplifies a shift towards more interpretable and computationally practical approaches in time-series analysis. The ability to reduce computational overhead without sacrificing interpretative power is particularly valuable in real-time and resource-constrained environments. Furthermore, by providing insights into the types of dynamics that distinguish classes, catch22 facilitates a deeper understanding of the underlying processes in complex time-series data.

Future research could explore the application of this methodology to even larger and more varied datasets beyond the scope of the UEA/UCR repository. Additionally, while catch22 excels in many contexts, its integration with ensemble learning methods could further enhance performance by combining the strengths of feature-based and shape-based approaches. As the field progresses, extensions of this work could involve automated re-calibration of feature subsets for specific application domains to maximize performance gains while maintaining interpretability.

In conclusion, "catch22: CAnonical Time-series CHaracteristics" advances the field of time-series analysis by providing an efficient and interpretable framework for feature selection, balancing the trade-off between computational feasibility and classification power.