Anomaly Detection for Irregular Time Series

Updated 2 October 2025

The paper introduces a Gaussian process adapter that converts irregular time series into fixed, uncertainty-aware representations for effective anomaly detection.
It employs structured kernel interpolation and Lanczos approximation to scale the model efficiently for long sequences and variable sampling.
End-to-end discriminative training with uncertainty propagation enhances anomaly scoring in real-world applications like healthcare, IoT, and industrial monitoring.

Anomaly detection in irregularly sampled time-series refers to the identification of rare or unusual patterns, events, or subsequences in temporal data, where observations are recorded at non-uniform time intervals. Irregular sampling arises naturally in domains such as healthcare (irregular clinical visits), IoT (asynchronous sensor reporting), and system monitoring (event-driven logging). The inherent challenges include uncertainty from sparsity, varying dimensionality across sequences, non-trivial temporal dependencies, and the need for methods that are computationally and statistically robust across a spectrum of real-world settings.

1. Foundational Challenges in Irregularly Sampled Time Series

Irregular sampling disrupts traditional time-series modeling techniques which rely on fixed-interval observations and uniformly sized feature vectors. The primary obstacles are:

Loss of temporal alignment: Standard models (e.g., ARIMA, RNNs without adaptation) cannot directly process time series with missing values or non-uniform intervals.
Data sparsity and uncertainty: Fewer observations per sequence lead to under-determined representations of underlying temporal processes, warranting uncertainty-aware modeling.
Non-fixed input dimensionality: Many classical and deep anomaly detectors assume inputs are fixed-length vectors or arrays.

Frameworks must therefore accommodate variability in sequence length, gaps between observations, and the resulting uncertainty in process inference.

2. Gaussian Process Adapter and Uncertainty Propagation

The framework described in "A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification" (Li et al., 2016) introduces a principled approach to regularizing irregularly sampled data for downstream learning:

Gaussian Process Regression Layer: Every irregular sequence is projected to a fixed set of virtual reference points via Gaussian process (GP) regression, yielding a joint normal distribution $N(z\mid\mu,\Sigma)$ over those points. The expressions for the mean $\mu$ and covariance $\Sigma$ at reference points $x$ are:

$\mu = K_{x,t}(K_{t,t} + \sigma^2I)^{-1}v,\quad \Sigma = K_{x,x} - K_{x,t}(K_{t,t} + \sigma^2 I)^{-1}K_{t,x}.$

Here, $K_{a,b}$ denotes the kernel evaluated at $a$ and $b$ , $v$ are the observed values, and $t$ the observation times.

Fixed-Dimensional Representation: This mapping ensures that even inputs with differing timestamps and lengths are converted into a fixed-dimensional, uncertainty-aware form, suitable for any downstream “black-box” classifier.
Uncertainty Propagation: The framework computes the expected loss by integrating over the posterior $N(\mu,\Sigma)$ :

$\mathbb{E}_{z \sim N(\mu,\Sigma;\theta)}[\ell(f(z;w),y)]$

This recognizes that uncertainty about the latent process should propagate through to loss computation and model updates—a crucial factor in sparse and irregular contexts.

This adapter can be plugged into one-class classifiers, autoencoders, or even non-linear discriminators designed for anomaly detection. The explicit modeling of uncertainty supports robust anomaly scoring, as deviations with low posterior confidence are penalized differently from deviations that simply coincide with missingness or data sparsity.

3. Scalability for Large and Long Sequences

Classical GP inference scales as $O(n^3)$ , which is intractable for large $n$ . The framework (Li et al., 2016) combines:

Structured Kernel Interpolation (SKI): Approximates the full kernel via interpolation on a set of inducing points, replacing $K_{\alpha,\beta}\approx W_\alpha K_{u,u} W_\beta^\top$ , with $W$ sparse, which dramatically compresses kernel evaluations.
Lanczos Approximation: Efficiently computes products of the form $\Sigma^{1/2}\xi$ (for sampling and reparameterization) via projection onto a low-dimensional Krylov subspace—avoiding Cholesky decomposition or full eigendecomposition.

These computational advances push time and memory complexity from cubic/quadratic to linear or near-linear in sequence length, enabling the handling of high-frequency, long time series.

4. End-to-End Discriminative Training

The architecture supports discriminative, end-to-end training via reparameterization of sampling:

$z = \mu + \Sigma^{1/2}\xi,$

with $\xi \sim N(0,I)$ , permitting backpropagation through both the GP adapter and classifier (or anomaly detector). The joint training objective is:

$(w^*,\theta^*) = \arg\min_{w,\theta} \sum_{i=1}^N \mathbb{E}_{z_i \sim N(\mu_i, \Sigma_i;\theta)}[\ell(f(z_i;w),y_i)].$

This allows the GP hyperparameters to be tuned to directly minimize the downstream anomaly detection loss, leading to representations optimized for the anomaly detection task and for propagating useful uncertainty information.

5. Adaptation for Anomaly Detection

Although originally proposed for classification, the GP adapter paradigm generalizes naturally to anomaly detection in irregularly sampled time series:

The GP-derived mean and covariance encode not just the expected value of the latent time series under normality, but also the model's epistemic (sampling) and aleatoric (inherent) uncertainties.
Anomalies can be scored according to the Mahalanobis distance in the projected latent space, penalized according to the GP-derived uncertainty:

$\mathrm{score}(z) = (z-\mu)^\top \Sigma^{-1} (z-\mu)$

The uncertainty-aware representation allows the anomaly detector to differentiate between rare, poorly sampled data and true outlier behavior, resolving ambiguity that plagues naive imputation or interpolation approaches.
Any standard anomaly detector that operates on fixed-dimensional inputs (e.g., one-class SVM, autoencoder, isolation forest) can process the GP-adapted features, inheriting uncertainty quantification from the regression step.

The architecture supports joint optimization of the GP adapter and an anomaly detection head, allowing for discriminative tuning of the full system toward robust outlier identification in the presence of temporal irregularity.

6. Practical Implementation and Applications

The framework accommodates variable-length or time-stamped records and is applicable to:

Healthcare: Modeling medical event records, vital sign logs, or lab measurements with variable revisit schedules.
IoT and Remote Sensing: Processing sensor telemetry streams where transmission gaps, asynchronous reporting, or event-driven logging are standard.
Industrial Monitoring: Handling systems with non-periodic downtime, maintenance intervals, or missing data segments.

Resource requirements depend on the number of reference points, inducing points for SKI, and desired approximation accuracy. For long sequences, the computational costs are amortized due to the scalability of the SKI and Lanczos combination.

A spectrum of trade-offs exists between representation fidelity, computational cost, and latency—configurable via the number of reference and inducing points, and the Krylov projection rank in Lanczos sampling.

7. Limitations and Future Directions

The GPs’ stationarity assumptions or choice of kernel may bias representations if data exhibit non-stationary dynamics.
While the approach propagates uncertainty effectively, if the anomaly of interest is a distributional change not captured by the kernel (e.g., sharp regime shifts), augmenting the kernel or adapting the reference set may be necessary.
In extremely high-dimensional multivariate time series, computation can still become a bottleneck, though approaches such as sparse GPs or variational inference offer further scaling avenues.

Ongoing research extends these principles to joint modeling of labels, temporal segmentation, and handling even richer forms of irregularity (e.g., asynchronous multivariate observation times).

In summary, the anomaly detection framework outlined in (Li et al., 2016) leverages a Gaussian process adapter to project irregularly sampled time series into a fixed, uncertainty-aware latent representation. Combined with scalable inference (SKI and Lanczos), end-to-end discriminative training, and uncertainty-propagating loss functions, this architecture enables robust, high-throughput anomaly detection for complex real-world temporal data with irregular sampling. The integration of uncertainty into the detection process not only improves accuracy under data sparsity but also provides principled anomaly scores appropriately calibrated to observation density and uncertainty.

PDF Markdown Chat (Pro)

References (1)

A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification (2016)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Anomaly Detection Framework for Irregularly Sampled Time-Series.