A Time Series Forest for Classification and Feature Extraction (1302.2277v2)

Published 9 Feb 2013 in cs.LG

Abstract: We propose a tree ensemble method, referred to as time series forest (TSF), for time series classification. TSF employs a combination of the entropy gain and a distance measure, referred to as the Entrance (entropy and distance) gain, for evaluating the splits. Experimental studies show that the Entrance gain criterion improves the accuracy of TSF. TSF randomly samples features at each tree node and has a computational complexity linear in the length of a time series and can be built using parallel computing techniques such as multi-core computing used here. The temporal importance curve is also proposed to capture the important temporal characteristics useful for classification. Experimental studies show that TSF using simple features such as mean, deviation and slope outperforms strong competitors such as one-nearest-neighbor classifiers with dynamic time warping, is computationally efficient, and can provide insights into the temporal characteristics.

Citations (514)

View on Semantic Scholar

Summary

The paper presents TSF, a novel tree-ensemble method that uses the Entrance gain metric to optimize split decisions in time series classification.
It employs random feature sampling with interval-based statistics to reduce complexity and achieve linear scalability.
Experimental results on 45 datasets demonstrate TSF's superior accuracy over methods like NNDTW while providing insights via temporal importance curves.

A Time Series Forest for Classification and Feature Extraction

The paper presents a novel tree-ensemble method, termed as Time Series Forest (TSF), specifically designed for time series classification. The crux of TSF lies in its innovative use of a new split evaluation metric — the Entrance gain — which amalgamates entropy gain with a margin measure to enhance the accuracy of the tree splits.

Methodological Contributions

TSF introduces the Entrance gain as a criterion to differentiate among potential splits that exhibit identical entropy gains. This metric prioritizes splits with significant margins, thereby refining the selection process. The method effectively tackles the large feature space challenge inherent in time series data by employing a random feature sampling strategy, reducing the complexity to linear with respect to time series length.

Given a time series classification problem, TSF extracts interval-based features such as mean, standard deviation, and slope, over randomly sampled intervals at each node of the decision tree. This sampling approach is analogous to the feature selection in random forests, thereby retaining computational efficiency and scalability.

Experimental Insights

Experimental evaluations on 45 benchmark datasets demonstrate that TSF consistently outperforms competitive methods, including 1-Nearest Neighbor with Dynamic Time Warping (NNDTW) and traditional entropy gain-based classifiers. This performance is observed across datasets with varying characteristics, thereby underscoring TSF's robustness and adaptability.

Interpretability and Insights

While decision trees offer inherent interpretability, the ensemble nature of TSF could obscure understanding. To mitigate this, the paper introduces the concept of a temporal importance curve. This curve aggregates the informativeness of features over time intervals, derived from the entropy gains across ensemble trees, providing clarity on which temporal segments are most relevant for classification.

Implications and Future Directions

TSF stands out not only for its classification efficacy but also for its potential application in domains requiring interpretable models, such as finance and medicine. Moreover, the ability to identify key temporal features could enable domain-specific insights, thus offering dual utility of classification and feature discovery.

The paper opens avenues for future research to extend TSF to handle time series of varying lengths natively, without preprocessing steps such as alignment. Furthermore, exploring more complex feature types while balancing the interpretability could push the boundaries of TSF's application in more nuanced scenarios.

Overall, this work provides a substantial contribution to time series classification by marrying accuracy with interpretability and computational efficiency, making it a promising approach for both academic inquiry and practical applications.

PDF Markdown