Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Monash University, UEA, UCR Time Series Extrinsic Regression Archive (2006.10996v3)

Published 19 Jun 2020 in cs.LG and stat.ML

Abstract: Time series research has gathered lots of interests in the last decade, especially for Time Series Classification (TSC) and Time Series Forecasting (TSF). Research in TSC has greatly benefited from the University of California Riverside and University of East Anglia (UCR/UEA) Time Series Archives. On the other hand, the advancement in Time Series Forecasting relies on time series forecasting competitions such as the Makridakis competitions, NN3 and NN5 Neural Network competitions, and a few Kaggle competitions. Each year, thousands of papers proposing new algorithms for TSC and TSF have utilized these benchmarking archives. These algorithms are designed for these specific problems, but may not be useful for tasks such as predicting the heart rate of a person using photoplethysmogram (PPG) and accelerometer data. We refer to this problem as Time Series Extrinsic Regression (TSER), where we are interested in a more general methodology of predicting a single continuous value, from univariate or multivariate time series. This prediction can be from the same time series or not directly related to the predictor time series and does not necessarily need to be a future value or depend heavily on recent values. To the best of our knowledge, research into TSER has received much less attention in the time series research community and there are no models developed for general time series extrinsic regression problems. Most models are developed for a specific problem. Therefore, we aim to motivate and support the research into TSER by introducing the first TSER benchmarking archive. This archive contains 19 datasets from different domains, with varying number of dimensions, unequal length dimensions, and missing values. In this paper, we introduce the datasets in this archive and did an initial benchmark on existing models.

Citations (24)

Summary

  • The paper presents a comprehensive TSER archive that benchmarks extrinsic regression challenges using 19 diverse datasets.
  • It evaluates multiple algorithms, with ROCKET achieving the best average performance based on RMSE.
  • The archive spans various real-world domains, offering a framework to catalyze advanced research in TSER.

Monash University, UEA, UCR Time Series Extrinsic Regression Archive

This paper presents the Monash University, UEA, UCR Time Series Extrinsic Regression Archive, aiming to enhance time series extrinsic regression (TSER) research by introducing a comprehensive benchmarking archive. This initiative addresses a notable gap in time series research, particularly with respect to extrinsic regression challenges that have previously been overshadowed by the focus on time series classification (TSC) and time series forecasting (TSF).

Introduction and Motivation

The authors highlight that the advancement of machine learning disciplines largely depends on robust benchmarking datasets. The repository at the University of California Irvine, alongside time series archives from UCR and UEA, significantly contributed to the TSC and TSF communities. However, no analogous resources existed for TSER—a challenging paradigm where the interest lies in predicting a single continuous value that may not be a direct continuation of the time series.

TSER differs from both TSC and TSF. While TSC is concerned with predicting discrete labels, and TSF with forecasting future time series values, TSER targets broader regression problems where a continuous value, potentially independent of the immediate past data or future estimations, is predicted.

TSER Dataset Archive

This paper introduces the first TSER benchmark archive comprising 19 datasets across diverse domains: energy monitoring, environment monitoring, health monitoring, sentiment analysis, and forecasting. Each dataset is crafted to offer unique attributes. For instance, they include multivariate, unequally-timed, missing entries, and varying dimensionalities. Notably, these datasets provide practical representations of real-world TSER scenarios, such as predicting daily appliances' energy usage using hourly sensor data.

The datasets are derived from reputable sources, including the UCI machine learning repository, contributing entities such as the World Health Organization, and bespoke calibrations like synthetic flood models from Monash University researchers.

Baseline Evaluation and Results

The paper sets a baseline for TSER by evaluating a variety of algorithms, including:

  • Functional principal component regression (FPCR) with/without B-spline.
  • Support vector regression (SVR).
  • Ensemble methods like random forests (RF) and XGBoost.
  • Neural network-based time series classifiers such as FCN, ResNet, and Inception Network, alongside ROCKET—a highly efficient random convolutional kernel-based classifier.

These methods were benchmarked using root mean squared error (RMSE), facilitated by standard implementations in Python libraries (e.g., Scikit-Learn). The efficacy evaluation showed ROCKET achieving the best average ranking among the algorithms, demonstrating its superior utility in handling TSER tasks. However, the outcome also highlighted that traditional machine learning approaches remain competitive, suggesting that the development of dedicated TSER methods is necessary to enhance performance further.

Implications and Future Directions

This pioneering work sets a groundbreaking precedent in TSER research by providing a structured framework that mirrors real-world complexities within the domain. The establishment of the TSER archive not only promotes algorithmic innovation but also fosters a deeper understanding and solution design for extrinsic regression problems hampered by irregularities and multidimensionality.

Future directions include expanding the archive to include more varied datasets, continuously updating the benchmark with cutting-edge algorithms, and improving the coverage of domain-specific challenges within the datasets. This work is set to catalyze significant growth in the TSER landscape, with potential spill-overs in broader predictive modeling and machine learning applications.

The authors acknowledge contributions from various institutions and extend an open invitation for further data donations to sustainably evolve the archive's scope. As TSER becomes progressively crucial in time series analysis, this archive will likely become an essential resource for researchers and industry practitioners.