TS-CHIEF: A Scalable and Accurate Forest Algorithm for Time Series Classification (1906.10329v2)

Published 25 Jun 2019 in cs.LG and stat.ML

Abstract: Time Series Classification (TSC) has seen enormous progress over the last two decades. HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles) is the current state of the art in terms of classification accuracy. HIVE-COTE recognizes that time series data are a specific data type for which the traditional attribute-value representation, used predominantly in machine learning, fails to provide a relevant representation. HIVE-COTE combines multiple types of classifiers: each extracting information about a specific aspect of a time series, be it in the time domain, frequency domain or summarization of intervals within the series. However, HIVE-COTE (and its predecessor, FLAT-COTE) is often infeasible to run on even modest amounts of data. For instance, training HIVE-COTE on a dataset with only 1,500 time series can require 8 days of CPU time. It has polynomial runtime with respect to the training set size, so this problem compounds as data quantity increases. We propose a novel TSC algorithm, TS-CHIEF (Time Series Combination of Heterogeneous and Integrated Embedding Forest), which rivals HIVE-COTE in accuracy but requires only a fraction of the runtime. TS-CHIEF constructs an ensemble classifier that integrates the most effective embeddings of time series that research has developed in the last decade. It uses tree-structured classifiers to do so efficiently. We assess TS-CHIEF on 85 datasets of the University of California Riverside (UCR) archive, where it achieves state-of-the-art accuracy with scalability and efficiency. We demonstrate that TS-CHIEF can be trained on 130k time series in 2 days, a data quantity that is beyond the reach of any TSC algorithm with comparable accuracy.

Citations (177)

View on Semantic Scholar

Summary

The paper introduces TS-CHIEF, which integrates multiple tree-based splitters to achieve high accuracy and scalability in time series classification.
TS-CHIEF employs a diverse ensemble combining similarity-based, dictionary-based, and interval-based splitters to significantly reduce computational demands compared to HIVE-COTE.
Empirical results on UCR datasets show TS-CHIEF can process large-scale data (up to 130,000 series in two days) with a speed-up factor exceeding 900x.

TS-CHIEF: A Scalable and Accurate Forest Algorithm for Time Series Classification

The paper presents TS-CHIEF, an algorithm developed to address two critical aspects of time series classification (TSC) — achieving high classification accuracy and scalability. The motivation behind TS-CHIEF arises from the limitations of the existing state-of-the-art method, HIVE-COTE, which offers superior classification accuracy but suffers from impractical computational demands. HIVE-COTE’s requirement of polynomial runtime concerning training set size makes it infeasible for large-scale applications, as shown when processing a dataset of merely 1,500 series could take up to eight days of CPU time.

TS-CHIEF, short for Time Series Combination of Heterogeneous and Integrated Embedding Forest, innovates by integrating efficient tree-structured classifiers that harness multiple forms of time series embeddings developed over the past decade. The algorithm is conceptualized as an ensemble classifier that synergistically combines simplicity, efficiency, and the capability to maintain high accuracy, but significantly faster than HIVE-COTE.

Key Findings and Contributions

Precision and Scalability: TS-CHIEF achieves competitive accuracy with scalability that exceeds previous methodologies. Evaluation on 85 datasets from the University of California Riverside (UCR) archive confirms its state-of-the-art accuracy. Notably, the algorithm handles vast amounts of data at significantly reduced computational times compared to HIVE-COTE.
Algorithm Structure: TS-CHIEF constructs a forest where each tree employs stochastic ensemble techniques and a vast diversity of time-series-specific splitting criteria. The forest integrates:
- Similarity-based Splitters: Similar to Proximity Forest using multiple proximity measures.
- Dictionary-based Splitters: Inspired by the BOSS model, using Symbolic Fourier Approximation transformation.
- Interval-based Splitters: Utilizing strategies akin to RISE, employing time and frequency domain features.
Performance Analysis: The paper methodologically contrasts TS-CHIEF with HIVE-COTE, illustrating that TS-CHIEF can be trained on 130,000 time series within two days — a data scale unattainable by HIVE-COTE in feasible time — demonstrating a speed-up factor of over 900x on a smaller set and an estimation of over 46,000x on larger datasets.
Ensemble Size and Splitters Contribution: The authors provide an exploration into the ensemble size’s effect on performance and the individual contributions of different types of splitting functions. Results prognosticate that all splitting methods contribute meaningful diversity and predictive power, which are fundamental in ensemble learning toward optimizing accuracy.

Implications and Future Directions

TS-CHIEF’s integration of various splitting functions within tree-based structures represents a promising direction for the development of scalable, accurate, and versatile classifiers in TSC. This architecture is particularly suited to a variety of applications where time series data is dynamically abundant and requires rapid processing — such as medical data analytics, satellite imagery analysis, and activity recognition.

As strong as the current results are, potential future developments may include:

Enhancing the efficiency further by optimizing memory usage and computational time.
Adapting the algorithm to handle multi-variate and variable-length time series.
Exploring automatic tuning approaches to refine the selection of candidate splitters adaptively based on datasets.

In conclusion, TS-CHIEF stands as a pillar of advancement in TSC, showcasing a powerful blend of accuracy and computational efficiency. It enables the handling of extensive datasets with reduced training time, promising substantial practical impact across domains reliant on time-sensitive data classification.

PDF Markdown

TS-CHIEF: A Scalable and Accurate Forest Algorithm for Time Series Classification (1906.10329v2)

Summary

TS-CHIEF: A Scalable and Accurate Forest Algorithm for Time Series Classification

Key Findings and Contributions

Implications and Future Directions

Related Papers