The UEA multivariate time series classification archive, 2018 (1811.00075v1)

Published 31 Oct 2018 in cs.LG and stat.ML

Abstract: In 2002, the UCR time series classification archive was first released with sixteen datasets. It gradually expanded, until 2015 when it increased in size from 45 datasets to 85 datasets. In October 2018 more datasets were added, bringing the total to 128. The new archive contains a wide range of problems, including variable length series, but it still only contains univariate time series classification problems. One of the motivations for introducing the archive was to encourage researchers to perform a more rigorous evaluation of newly proposed time series classification (TSC) algorithms. It has worked: most recent research into TSC uses all 85 datasets to evaluate algorithmic advances. Research into multivariate time series classification, where more than one series are associated with each class label, is in a position where univariate TSC research was a decade ago. Algorithms are evaluated using very few datasets and claims of improvement are not based on statistical comparisons. We aim to address this problem by forming the first iteration of the MTSC archive, to be hosted at the website www.timeseriesclassification.com. Like the univariate archive, this formulation was a collaborative effort between researchers at the University of East Anglia (UEA) and the University of California, Riverside (UCR). The 2018 vintage consists of 30 datasets with a wide range of cases, dimensions and series lengths. For this first iteration of the archive we format all data to be of equal length, include no series with missing data and provide train/test splits.

Citations (352)

View on Semantic Scholar

Summary

The paper introduces a new MTSC archive addressing the need for diverse datasets and standardized benchmarks in multivariate time series classification.
It details a rigorous methodology including dataset curation, predefined splits, and baseline evaluations using 1-NN and DTW.
The archive advances research by offering comprehensive datasets from areas such as HAR, ECG, EEG, and audio, fostering algorithmic innovation.

Analysis of "The UEA Multivariate Time Series Classification Archive, 2018"

The paper "The UEA Multivariate Time Series Classification Archive, 2018" systematically outlines the development of a new repository aimed at advancing research in multivariate time series classification (MTSC). Authored by Anthony Bagnall and collaborators, it serves as a counterpart to the well-established univariate archive, enriching the domain with diverse datasets and baseline assessment methodologies.

Motivation and Background

Time series classification has witnessed significant advancements, with the univariate domain receiving considerable attention over the years. The UCR archive has played a pivotal role as a benchmark for univariate classifiers, compelling researchers to refine algorithms based on extensive dataset evaluations. However, MTSC trails behind, lacking the same rigorous standards due to limited and often non-representative datasets. This research addresses the gap by curating the MTSC archive, which consists of 30 datasets, offering a balanced representation across different domains and data characteristics.

Archive Composition

The multivariate archive is presented as a step forward, inspired by the collaborative efforts between the University of East Anglia and the University of California, Riverside. The archive is distributed across various domains, specifically:

Human Activity Recognition (HAR): The largest grouping, comprising data from wearable sensors, offers nine datasets including BasicMotions, Cricket, and RacketSports. These datasets utilize accelerometer and gyroscope data to classify activities, providing a substantial testing ground for HAR algorithms.
Motion Classification, ECG, and EEG/MEG Classification: These categories illustrate broader applications ranging from medical signal interpretation to gesture recognition, involving datasets like ArticularyWordRecognition and AtrialFibrillation.
Audio Spectra Classification and Miscellaneous: Covering problems from heart sound classification to astronomical light curve analysis, these datasets underscore the versatility of MTSC.

Methodological Approach

Data within the archive is formatted to ensure methodological consistency, offering predefined training/test splits and availability of individual dimensional data when applicable. Standardization of series length and absence of missing data are foundational prerequisites set by the authors, facilitating reproducibility across experiments.

Baseline Benchmarks

The paper provides initial benchmarking using established classifiers to set a performance baseline. The examination involves 1-Nearest Neighbour (1-NN) classifiers utilizing Euclidean distance (ED), and variations of Dynamic Time Warping (DTW), employing both dimension-independent and dimension-dependent metrics. Results indicate varying degrees of classifier efficacy, with some datasets achieving high accuracy, while others defy successful classification by these simple models.

Implications and Future Directions

The introduction of this archive is anticipated to catalyze theoretical and practical advancements in MTSC. The diversity and comprehensiveness of the datasets aim to stimulate algorithmic innovations and robust comparisons. Future versions of the archive are expected to expand further, possibly incorporating additional data domains and refined benchmarks.

This work acknowledges the complexity of MTSC tasks and the need for representative, diverse datasets to aid in the development of generalized solutions. It invites contributions from the research community, both in the form of new data and algorithmic results, thereby nurturing a collaborative environment conducive to scientific advancement. The archive serves as a foundational tool, with significant potential for driving future research forward in the field of time series classification.

PDF Markdown