Papers
Topics
Authors
Recent
2000 character limit reached

Shapelet-Based Glitches

Updated 21 December 2025
  • Shapelet-based glitches are transient, localized anomalies in time series characterized by informative subsequences that capture essential waveform features.
  • They utilize frameworks like Gauss–Hermite and discriminative shapelets to improve detection accuracy across domains such as gravitational-wave and biomedical signal analysis.
  • Advanced techniques including matching pursuit, Bayesian MCMC, and dual-attention modules enhance glitch detection, parameter estimation, and synthetic generation.

Shapelet-based glitches are transient, localized anomalies in time series that are characterized and detected using "shapelets"—compact, informative subsequences capturing essential waveform features. The concept arises from the intersection of time-domain morphological analysis and statistical pattern recognition, with demonstrated impact in gravitational-wave instrumentation, clinical biomedical signals, and industrial time-series anomaly detection. Shapelet-based approaches are central to robust glitch modeling, detection, generation, and interpretable classification in noisy, dynamic measurement environments, often outperforming conventional pointwise or frequency-domain detectors in their ability to resolve and parameterize transient, structured events.

1. Mathematical Foundations of Shapelets

A shapelet is formally a short subsequence S=[s1,...,s]S = [s_1, ..., s_\ell] extracted from a longer time series X=[x1,...,xT]X = [x_1, ..., x_T], with T\ell \ll T. In the anomaly detection context, a shapelet-based glitch is defined as a localized instance where a subsequence exhibits a statistical or morphological deviation from the expected (normal) pattern, as measured by a chosen dissimilarity metric D(,)D(·,·). Two principal shapelet frameworks have been established:

  • Gauss–Hermite (physical) shapelets: In the context of instrumental glitches, a basis of one-dimensional Gauss–Hermite functions ϕn(t;β)\phi_n(t;\beta) is used, where nn is the order (node count), β\beta the scale (width), and Hn()H_n(\cdot) the Hermite polynomial. An arbitrary transient g(t)g(t) is decomposed as g(t)=n=0Ncnϕn(t;β)g(t) = \sum_{n=0}^N c_n \phi_n(t; \beta). Sparse and robust parameterization is achieved through 0\ell_0-penalized likelihood maximization and matching pursuit, with subsequent refinement by Bayesian MCMC (Baghi et al., 2021).
  • Discriminative time series shapelets: For interpretable glitch classification, a shapelet is extracted so as to maximize its class discriminative power—e.g., by searching for SS that achieves high information gain (IG) when splitting a dataset based on PSD(X,S)PSD(X,S), the Perceptual Subsequence Distance (Le et al., 9 Mar 2025).

In both cases, the waveform is reduced to a set of localized features encapsulating amplitude, scale, and temporal position, enabling fine-grained, noise-resilient glitch representation.

2. Detection and Parameter Estimation

Shapelet-based glitch detection involves:

  • Decomposing observed series into candidate shapelet components,
  • Estimating the significance and parameters {cn,β,τ}\{c_n, \beta, \tau\},
  • Iteratively subtracting significant atoms via matching pursuit or greedy algorithms.

For physical setup (e.g., LISA Pathfinder), this proceeds by matched filtering for different n,β,τn, \beta, \tau, terminating when no shapelet atom achieves SNR above threshold (e.g., SNR ≳ 5 corresponds to a \sim0.01% false-alarm rate per 2.5 days). Bayesian MCMC post-processing refines parameters for the most significant events, correcting noise-weighting and interpolation biases.

In discriminative classification scenarios (e.g., patient-ventilator synchrony), candidate shapelets are efficiently extracted using Perceptually Important Points (PIPs) as segment boundaries, ranking candidates by information gain. The final detection layer typically involves feeding shapelet-distance feature vectors, optionally concatenated with handcrafted statistical signatures, into a shallow feed-forward network (Le et al., 9 Mar 2025).

3. Statistical Characterization and Synthetic Generation

The empirical distributions of glitch parameters—such as inter-arrival times, amplitude scales, and damping times—provide insight into systematics and rare event structure. For instance, in LISA Pathfinder glitch populations, inter-arrival intervals Δτ\Delta \tau follow an exponential law with λ5×105s1\lambda \approx 5 \times 10^{-5}\,\mathrm{s}^{-1} (ordinary runs) and tenfold increase in "cold" runs (Baghi et al., 2021).

Synthetic glitch generation is conducted by:

  • Sampling glitch times τi\tau_i from the empirical exponential;
  • Drawing pairs (α,β)(\alpha, \beta) from the joint amplitude-damping distribution, typically modeled with normalizing flows to capture tail behavior;
  • Assembling the synthetic waveform g(t)=αψ1(tτβ)g(t) = \alpha \psi_1\left(\frac{t-\tau}{\beta}\right) with ψ1\psi_1 a suitable exponential or Hermite shapelet.

This approach supports large-scale simulation studies and algorithm validation in both physics and biomedical domains (Baghi et al., 2021, Le et al., 9 Mar 2025).

4. Advanced Classification and Interpretability

Interpretability is inherent in shapelet-based schemes. Once a pool of discovered shapelets {Sj}\{S^j\} is established, any input XX can be mapped to a vector of shapelet distances Zshaj=PSD(X,Sj)Z_{\text{sha}}^j = PSD(X, S^j), which serves as a directly explainable feature set. Model decisions can be visualized by overlaying best-matched shapelets on the input, generating "heat-maps" of response intensity along time, allowing for qualitative validation by domain experts (e.g., clinicians in ventilator scenarios) (Le et al., 9 Mar 2025).

SHIP, for example, concatenates shapelet vectors with statistical summaries (e.g., logarithmic signatures) and trains a compact three-layer classifier. The subsequence matches are used for post-hoc explanations: the location and distance of best-matching shapelets correspond directly to the detected glitch type, supporting transparent diagnostics.

5. Shapelet Frameworks in Time Series Anomaly Detection

The recent TShape framework extends shapelet-based glitch identification to complex, nontrivial industrial time series anomalies using:

  • Patch-wise multi-scale convolution to extract multi-resolution local shapelet features,
  • Patch-wise positional encoding,
  • Dual (local-intra-patch and global-inter-patch) self-attention mechanisms with gated fusion, all trained solely on normal data to minimize reconstruction error (Cui et al., 1 Oct 2025). The per-time-point anomaly score is st=xtx^ts_t = |x_t - \hat{x}_t|, and event-level detection is thresholded on these residuals.

Table: TShape Event-F1 (F1-E) Scores Versus FCVAE Baseline

Dataset FCVAE F1-E TShape F1-E
AIOPS 0.7364 0.8049
NAB 0.7933 0.9186
TODS 0.6689 0.8561
UCR 0.5126 0.5915
WSD 0.8695 0.9137

TShape achieves an average of +10% F1-E improvement over FCVAE, validating the necessity of multi-scale convolution and dual-attention modules via ablation studies. Attention maps highlight both fine-grained local and global contextual relevance—well-suited for domains where glitches correspond to localized morphological departures (Cui et al., 1 Oct 2025).

6. Impact on Downstream Analysis and Detection Robustness

Shapelet-based glitch modeling crucially affects downstream system performance:

  • In gravitational-wave detection (LISA), synthetic glitches can be projected from LPF records to the LISA data channel as effective fractional laser-frequency deviations, further processed via time-delay interferometry (TDI). Glitch-induced transients may inject SNRs ranging from 10210^{-2} to 10410^4 in TDI A/E/T channels, with ≈50% above SNR ≈ 10, sometimes exceeding astrophysical burst amplitudes within short intervals (Baghi et al., 2021). This suggests that shapelet-modeled glitches, if unaccounted for, could bias astrophysical parameter estimation or trigger false-positive event candidates.
  • In medical and industrial anomaly detection, shapelet features enable robust, interpretable event detection even under significant class imbalance or channel subsampling. For instance, SHIP achieves four-way F1 = 0.9765 and >0.89 per-class F1 for most asynchrony types, surpassing convolutional, recurrent, and latent mixture baselines (Le et al., 9 Mar 2025). TShape’s reconstruction-error methodology achieves average precision 0.88 and recall 0.85 compared to baselines at 0.78/0.76, evidencing superior sensitivity to complex shapelet-based glitches (Cui et al., 1 Oct 2025).

7. Prospects, Limitations, and Research Directions

Shapelet-based approaches generalize across disciplines but feature domain-specific caveats:

  • For detection and simulation, an adequate empirical library of glitches and accurate joint parameter statistics are required.
  • Most current frameworks address univariate or low-dimensional signals; extension to fully multivariate, cross-channel shapelet analysis is limited, though TShape and normalizing flows offer a plausible pathway (Cui et al., 1 Oct 2025).
  • Training dependency on predominantly “clean” background data may limit utility in high-anomaly-rate environments.
  • Future work will likely incorporate prototype-guided attention and self-supervised, library-driven shapelet mining, particularly for complex multi-source or multi-modal industrial signals.

In summary, shapelet-based glitch modeling constitutes a unifying paradigm for transient anomaly detection, offering sparse, interpretable, and highly effective glitch localization and characterization (Baghi et al., 2021, Le et al., 9 Mar 2025, Cui et al., 1 Oct 2025). Its flexible mathematical foundation supports cross-domain application and continues to underpin advances in both scientific instrumentation and critical time-series monitoring.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Shapelet-Based Glitches.