Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TimeREISE: Time-series Randomized Evolving Input Sample Explanation (2202.07952v2)

Published 16 Feb 2022 in cs.LG and cs.AI

Abstract: Deep neural networks are one of the most successful classifiers across different domains. However, due to their limitations concerning interpretability their use is limited in safety critical context. The research field of explainable artificial intelligence addresses this problem. However, most of the interpretability methods are aligned to the image modality by design. The paper introduces TimeREISE a model agnostic attribution method specifically aligned to success in the context of time series classification. The method shows superior performance compared to existing approaches concerning different well-established measurements. TimeREISE is applicable to any time series classification network, its runtime does not scale in a linear manner concerning the input shape and it does not rely on prior data knowledge.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Dominique Mercier (14 papers)
  2. Andreas Dengel (188 papers)
  3. Sheraz Ahmed (64 papers)
Citations (6)

Summary

The paper "TimeREISE: Time-series Randomized Evolving Input Sample Explanation" (Mercier et al., 2022 ) introduces a novel post-hoc, model-agnostic attribution method designed specifically for time series classification. It addresses the critical need for interpretability in deep learning models applied to time series data, particularly in safety-critical domains where understanding model decisions is essential.

The authors highlight that while deep learning has achieved significant success in time series classification, its black-box nature hinders adoption in areas like critical infrastructure, medical, and financial sectors. Existing interpretability methods, especially attribution techniques, are often adapted from the image domain and do not adequately address the unique characteristics of time series data, such as potential infinite length, multiple channels with distinct roles, smoothness requirements, and robustness against perturbations. Noisy or non-continuous explanations, acceptable in images, can be misleading for time series.

TimeREISE is presented as a solution that overcomes these limitations. It is inspired by the RISE (Randomized Input Sampling for Explanation) method used for images but incorporates several key adaptations for time series. The core idea, similar to RISE, is perturbation-based: generate random masks, apply them to the input time series, pass the masked input through the classifier to get confidence scores, and then combine the masks weighted by the corresponding scores to produce an attribution map.

The key adaptations for TimeREISE are:

  1. Channel-aware Masking: Instead of applying the same mask across all channels at a given timestep, TimeREISE generates masks that can occlude different timesteps across different channels within a single mask. This better reflects the potential distinct roles of channels in multivariate time series.
  2. Varying Density and Granularity: While original RISE uses masks with a fixed density of occluded points, TimeREISE utilizes a set of masks generated with varying densities (probability of occlusion) and granularities (size of occluded patches, controlled by downsampling). The final attribution is a summation over results obtained from masks of different densities and granularities, removing the assumption of a fixed number of relevant data points or patch sizes.
  3. Normalization: The contribution of each feature (timestep per channel) is normalized by the number of times it appeared in the applied masks. The final attribution map is then normalized to values between 0 and 1.

The implementation involves two stages:

  1. Mask Generation (Initialization): This needs to be done only once per dataset or input shape. For predefined sets of probabilities PP and granularities GG, and a number of masks NN, a set of masks MM is generated. Masks are created by downsampling the input shape (C,T)(C, T) along the time axis to (C,T)(C, T') based on granularity gg, applying a uniform distribution thresholded by probability pp, upsampling the mask back to the original shape (C,T)(C, T), and then cropping if necessary.
  2. Mask Application (Attribution): For a given input xx and classifier θ\theta, each generated mask mim_i from MM is applied using a perturbation function σ\sigma (default is element-wise multiplication). The masked input xmix_{m_i} is fed to the classifier θ\theta to obtain prediction scores yy'. The scores yy' are then combined with the masks MM (matrix product ST×MS^T \times M) and normalized by the number of times each feature appears across all masks, resulting in the final attribution map.

The paper evaluates TimeREISE against several state-of-the-art time series attribution methods: GuidedBackprop, IntegratedGradients, FeatureAblation, Occlusion, and LIME, covering different categories (gradient-based, perturbation-based, local model). The experiments were conducted on a variety of datasets from the UEA & UCR repository, including a synthetic Anomaly dataset with ground truth explanations. InceptionTime [fawaz2020inceptiontime], a state-of-the-art time series classifier, was used as the target model. Evaluation metrics included:

  • Insertion and Deletion (Causal Metric): Measuring the change in classifier performance (accuracy) when important features (as identified by the attribution) are progressively removed (deletion) or added (insertion). Lower AUC for deletion and higher AUC for insertion indicate better performance.
  • Infidelity and Sensitivity: Quantifying the relationship between perturbations in the attribution/input and changes in prediction. Lower values are better for both.
  • Continuity: Measuring the smoothness of the attribution map by summing the absolute differences between adjacent points. Lower values indicate smoother attributions, which can be easier to interpret visually but may conflict with exact correctness.

Key Experimental Findings:

  • Deletion & Insertion: TimeREISE significantly outperformed other methods in the deletion task (lowest average AUC) and achieved the best average AUC in the insertion task. This indicates its effectiveness in identifying features crucial for the classifier's prediction.
  • Sensitivity: TimeREISE achieved the lowest average Sensitivity score, demonstrating superior robustness of its attribution maps to small input perturbations compared to other methods, especially gradient-based ones.
  • Continuity: TimeREISE showed superior performance (lowest average score) in terms of Continuity, producing smoother attribution maps compared to other methods. This is attributed to the mask generation process involving downsampling and upsampling with interpolation.
  • Infidelity: All evaluated methods showed insignificant differences in Infidelity scores, suggesting that perturbing the attribution maps similarly affects the predictions across methods.
  • Runtime: The theoretical runtime analysis shows TimeREISE scales linearly with the number of masks NN, independent of the input shape size (C×T)(C \times T), which is more efficient than methods directly dependent on the number of features, particularly for long time series.
  • Visualization: Visual examples demonstrate that TimeREISE produces smoother attribution maps while still accurately highlighting important regions, like the peak in the Anomaly dataset or specific segments in ECG signals.

In conclusion, TimeREISE is presented as a practical and effective attribution method for time series classification. Its adaptations for time series data lead to superior performance in key metrics like identifying important features (Deletion/Insertion), robustness (Sensitivity), and visual interpretability (Continuity) compared to existing methods. It is model-agnostic, applicable to black-box classifiers, and offers efficient runtime scaling, making it a valuable tool for applying explainable AI in real-world time series applications.