Papers
Topics
Authors
Recent
Search
2000 character limit reached

DrCIF: Diverse Representation Canonical Interval Forest

Updated 12 June 2026
  • The paper introduces DrCIF, which unifies and extends TSF and RISE by extracting phase-dependent features across multiple time series representations.
  • DrCIF computes diverse features from raw, first difference, and periodogram data using a mix of 7 classical and 22 catch22 statistics for robust discrimination.
  • DrCIF employs an ensemble of unpruned time-series trees with randomized interval selection, leading to notable accuracy improvements on UCR and UEA benchmarks.

The Diverse Representation Canonical Interval Forest (DrCIF) is an interval-based ensemble classifier for time series classification, introduced as a core component of the HIVE-COTE 2.0 meta-ensemble. DrCIF unifies and extends the principle of extracting discriminatory phase-dependent features from time series intervals by leveraging multiple transformations, an enlarged and diverse feature pool, and a randomized forest-based learning structure. Its design synthesizes strengths of previous interval classifiers—most notably TSF and RISE—and surpasses them in both accuracy and representational richness by targeting local features across raw, differenced, and frequency domains, utilizing both classical summary statistics and the comprehensive catch22 feature suite (Middlehurst et al., 2021).

1. Motivation and Context

HIVE-COTE’s central thesis is that combining classifiers built on diverse time series representations maximizes classification accuracy due to the complementary discriminatory information encoded across domains. In HIVE-COTE 1.0, interval-based constituents included the Time Series Forest (TSF), which utilizes random intervals with classic summary features, and the Random Interval Spectral Ensemble (RISE), which extracts features from spectral representations. DrCIF replaces both by integrating the strengths of these approaches with substantial extensions: it captures local, phase-sensitive features at multiple scales, across both the time and frequency domains, thus greatly enriching the candidate feature set available to interval-based trees. This design is motivated by empirical observations that representations such as the raw series, its first difference, and its periodogram characterize distinct aspects of data, each useful for discrimination in specific contexts (Middlehurst et al., 2021).

2. Core Representations and Feature Extraction

DrCIF operates on three distinct representations for each series xx of length mm (for each dimension $1$ to dd in multivariate settings):

  • Raw series: X1=(x1,x2,,xm)X_1 = (x_1, x_2, \ldots, x_m)
  • First difference: Δxi=xi+1xi,i=1,,m1\Delta x_i = x_{i+1} - x_i, \quad i=1,\ldots,m-1 (X2X_2 of length m1m-1)
  • Periodogram: P(f)=t=1mxte2πift/m2,f=1,,m/2P(f) = |\sum_{t=1}^m x_t e^{-2\pi i f t/m}|^2, \quad f = 1,\ldots,\lfloor m/2 \rfloor (X3X_3 of length mm0)

From each sequence, mm1 random intervals are selected per base tree. Each interval is defined by a start point mm2 and a length mm3, with mm4 and mm5, where mm6 is the effective length of the representation. In the multivariate case, an interval is also randomly assigned to a dimension mm7.

Within each interval, DrCIF computes a subset of mm8 features drawn randomly from a candidate pool of 29 features:

  • 7 classical features: mean, standard deviation, least-squares slope, median, interquartile range, minimum, and maximum.
  • 22 catch22 features: a canonical subset representing measures of autocorrelation, entropy, distributional characteristics, and fluctuation properties (see Lubba et al., 2019 for detailed definitions).

This results in each tree extracting mm9 features per series.

Representation Interval Source Feature Types
Raw series $1$0 7 classic, 22 catch22
First difference $1$1 7 classic, 22 catch22
Periodogram $1$2, $1$3 7 classic, 22 catch22

3. Forest Construction and Training Procedure

DrCIF employs an ensemble of $1$4 unpruned “time-series trees,” with each tree trained on randomly subsampled features extracted from randomly chosen intervals of all three representations. Each node split in the tree is determined by maximizing information gain over the selected $1$5 features. The impurity function can be either Gini impurity,

$1$6

or entropy,

$1$7

where $1$8 is the class frequency vector in a node with $1$9 classes.

Key hyperparameters and their defaults:

  • dd0 (number of trees): 500
  • dd1 (intervals per representation per tree): dd2
  • dd3 (features per tree): 10

The trees are grown without pruning, utilizing only the dd4 features per tree determined by random attribute subsampling and interval selection. Classification is performed by majority vote over all dd5 trees.

Pseudocode for training a DrCIF tree:

Given a training set dd6:

  1. Draw a random subset dd7 of dd8 features from the 29 candidates.
  2. For each representation (dd9), repeat X1=(x1,x2,,xm)X_1 = (x_1, x_2, \ldots, x_m)0 times:
    • Randomly select X1=(x1,x2,,xm)X_1 = (x_1, x_2, \ldots, x_m)1 for interval position, length, and dimension.
    • For each series X1=(x1,x2,,xm)X_1 = (x_1, x_2, \ldots, x_m)2 and feature X1=(x1,x2,,xm)X_1 = (x_1, x_2, \ldots, x_m)3, compute X1=(x1,x2,,xm)X_1 = (x_1, x_2, \ldots, x_m)4.
  3. Construct an unpruned tree with X1=(x1,x2,,xm)X_1 = (x_1, x_2, \ldots, x_m)5, splitting nodes by information gain.

4. Computational Complexity and Implementation

Let X1=(x1,x2,,xm)X_1 = (x_1, x_2, \ldots, x_m)6 be the time series length, X1=(x1,x2,,xm)X_1 = (x_1, x_2, \ldots, x_m)7 the number of series, X1=(x1,x2,,xm)X_1 = (x_1, x_2, \ldots, x_m)8 the number of trees, X1=(x1,x2,,xm)X_1 = (x_1, x_2, \ldots, x_m)9 the intervals per representation, and Δxi=xi+1xi,i=1,,m1\Delta x_i = x_{i+1} - x_i, \quad i=1,\ldots,m-10 the attributes per tree. The dominant computational cost in DrCIF arises from feature extraction:

  • Feature extraction per tree: Δxi=xi+1xi,i=1,,m1\Delta x_i = x_{i+1} - x_i, \quad i=1,\ldots,m-11
  • Tree construction per tree: Δxi=xi+1xi,i=1,,m1\Delta x_i = x_{i+1} - x_i, \quad i=1,\ldots,m-12
  • Total training time: Δxi=xi+1xi,i=1,,m1\Delta x_i = x_{i+1} - x_i, \quad i=1,\ldots,m-13

Memory requirements are dominated by storage for a single feature matrix (Δxi=xi+1xi,i=1,,m1\Delta x_i = x_{i+1} - x_i, \quad i=1,\ldots,m-14) and a single tree during construction; total ensemble storage is Δxi=xi+1xi,i=1,,m1\Delta x_i = x_{i+1} - x_i, \quad i=1,\ldots,m-15.

Key efficiency optimizations include:

  • Randomized interval selection, avoiding exhaustive Δxi=xi+1xi,i=1,,m1\Delta x_i = x_{i+1} - x_i, \quad i=1,\ldots,m-16 search
  • Attribute subsampling (Δxi=xi+1xi,i=1,,m1\Delta x_i = x_{i+1} - x_i, \quad i=1,\ldots,m-17) per tree
  • Vectorized computation and reuse of intermediate statistics for classic summaries (means, variances)

5. Empirical Performance and Benchmarks

DrCIF demonstrates superior empirical performance among interval-based classifiers. On 112 univariate UCR datasets, averaged over 30 stratified resamples, DrCIF outperforms TSF, CIF, RISE, STSF, and similar interval ensembles. Statistical comparisons using pairwise Wilcoxon signed-rank tests with Holm correction at Δxi=xi+1xi,i=1,,m1\Delta x_i = x_{i+1} - x_i, \quad i=1,\ldots,m-18 show DrCIF as the top-ranked interval classifier: test-set accuracy is approximately 1–1.5 percentage points higher than CIF and 2–3 points higher than TSF, with Δxi=xi+1xi,i=1,,m1\Delta x_i = x_{i+1} - x_i, \quad i=1,\ldots,m-19. DrCIF’s contributions are central to HIVE-COTE 2.0’s performance, enabling the meta-ensemble to surpass all leading single-representation algorithms (including ROCKET, InceptionTime, TS-CHIEF, and HIVE-COTE 1.0) on both univariate (UCR) and multivariate (UEA) benchmarks (Middlehurst et al., 2021).

6. Significance and Role in Meta-Ensembles

DrCIF exemplifies the design principle that leveraging multiple transformed views of time series and an expanded interval-feature space produces improved discrimination and robustness. Its unification of time, difference, and spectral domain features with diverse summary statistics enables more informative splits within its trees, which ultimately translates to robust majority-vote classification. Within HIVE-COTE 2.0, DrCIF’s strengths are critical to the ensemble’s overall accuracy improvements, as it effectively replaces and improves upon both phase-dependent and spectral interval constituents previously used.

This suggests that interval ensembles like DrCIF provide a highly effective mechanism for exploiting temporal locality and multi-domain redundancy in supervised time series classification, especially when equipped with diverse and empirically validated feature sets (Middlehurst et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Diverse Representation Canonical Interval Forest (DrCIF).