Papers
Topics
Authors
Recent
2000 character limit reached

Contrastive Time Series Forecasting with Anomalies (2512.11526v1)

Published 12 Dec 2025 in cs.LG, cs.AI, and stat.ML

Abstract: Time series forecasting predicts future values from past data. In real-world settings, some anomalous events have lasting effects and influence the forecast, while others are short-lived and should be ignored. Standard forecasting models fail to make this distinction, often either overreacting to noise or missing persistent shifts. We propose Co-TSFA (Contrastive Time Series Forecasting with Anomalies), a regularization framework that learns when to ignore anomalies and when to respond. Co-TSFA generates input-only and input-output augmentations to model forecast-irrelevant and forecast-relevant anomalies, and introduces a latent-output alignment loss that ties representation changes to forecast changes. This encourages invariance to irrelevant perturbations while preserving sensitivity to meaningful distributional shifts. Experiments on the Traffic and Electricity benchmarks, as well as on a real-world cash-demand dataset, demonstrate that Co-TSFA improves performance under anomalous conditions while maintaining accuracy on normal data. An anonymized GitHub repository with the implementation of Co-TSFA is provided and will be made public upon acceptance.

Summary

  • The paper introduces Co-TSFA, a novel framework that employs contrastive latent-output alignment to differentiate between forecast-relevant regime shifts and transient input noise.
  • It demonstrates significant performance boosts, achieving up to 10× reduction in MSE for input-output anomalies across various forecasting backbones.
  • The model-agnostic approach maintains clean accuracy while ensuring graceful degradation under increasing anomaly severity, paving the way for reliable real-world deployment.

Contrastive Time Series Forecasting with Latent-Output Alignment under Anomalies

Background and Motivation

Modern time-series forecasting is foundational in numerous critical domains including financial risk management, energy grids, and operational logistics. In practice, forecast accuracy is challenged by the irregular presence of anomalous events in both historical (input) and to-be-forecasted (output) time windows. Classical and recent neural forecasting models—including ARIMA, LSTM, Transformer architectures, and Time-Series Foundation Models—typically either ignore or indiscriminately suppress all input anomalies, failing to distinguish between transient, forecast-irrelevant noise and distributional shifts that indicate genuine regime change. This indiscrimination results in an inability to adapt forecasts when persistent anomalies arise, or conversely, in an over-sensitivity that propagates spurious noise into the prediction.

The paper "Contrastive Time Series Forecasting with Anomalies" (2512.11526) formalizes the need for a framework that is both robust to input-only anomalies and responsive to forecast-relevant distributional shifts, especially when both types of anomalies occur at inference time. This is a pressing challenge for reliable real-world deployment, yet largely unaddressed by robust forecasting and contrastive representation learning frameworks which focus either on training-time contamination or do not differentiate between anomaly types.

Methodology: The Co-TSFA Framework

The proposed solution, Co-TSFA, is a model-agnostic contrastive regularization approach that applies to any encoder-based time-series forecasting architecture. During training, Co-TSFA systematically injects synthetic but realistic anomalies via controlled augmentations. These consist of two classes:

  • Input-only augmentations: Simulate transient, forecast-irrelevant disturbances (e.g., corrupted sensor readings limited to the input window). The downstream targets are left unchanged, enforcing that such anomalies do not affect the prediction.
  • Input-output augmentations: Simulate persistent regime shifts extending from the input window into the forecast horizon (e.g., shifts in demand patterns during sustained crises), requiring the model to adapt its forecast accordingly.

Central to Co-TSFA is a latent-output alignment loss: For each mini-batch, the framework computes a similarity between the latent representations of the original and augmented inputs, and analogously between their ground-truth or perturbed outputs. The regularization loss penalizes discrepancy between these two similarities, explicitly coupling representational change to changes in the predicted outcome. The similarity metric is implemented as a softmax-normalized dot-product over time and batch, inspired by InfoNCE, and is batch-level for negative sampling. The overall training loss combines the conventional prediction loss (e.g., MSE) with this contrastive latent-output alignment regularization.

This mechanism not only enforces invariance to irrelevant perturbations but also prevents excessive invariance that would suppress forecast-relevant input-output anomalies, a limitation of traditional robust and contrastive learning methods in this domain.

Experimental Evaluation

Co-TSFA is exhaustively evaluated on canonical benchmarks (Traffic, Electricity), a realistic and highly irregular cash withdrawal dataset, and the ETTh1 benchmark. Evaluations involve clean, input-only, and input-output anomaly regimes at test time; both clean and contaminated training data; and comparison with RobustTSF—a state-of-the-art method targeting robust training with pointwise anomalies but not equipped for persistent, test-time, or input-output anomaly settings.

Key findings:

  • Superior anomaly adaptation: Across all backbones (TimesNet, iTransformer, TimeXer, Autoformer, PAttn, Informer), Co-TSFA yields lower MAE, MSE, and SMAPE when anomalies occur during inference. The strongest relative gains occur in the input-output anomaly regime (up to 10×10\times MSE reduction on challenging settings), a context where conventional robust methods fail to adapt and suffer severe error escalation.
  • Robust generalization without sacrificing clean accuracy: When evaluated on clean test data, performance is never statistically degraded, showing that Co-TSFA’s regularization does not harm nominal generalization.
  • Graceful degradation under increasing anomaly severity/ratio: Co-TSFA maintains gradual performance decline as anomaly severity increases, whereas robust baselines degrade precipitously—an essential attribute for real-world deployments.
  • Training stability: The regularization does not induce training instabilities, even under high anomaly contamination.

Theoretical and Practical Implications

This study advances the theoretical understanding of anomaly-aware learning in time series. By directly coupling representational shifts to the forecast-relevant outcome shifts—rather than maximizing invariance or indiscriminately suppressing anomalous variation—Co-TSFA implements a more calibrated inductive bias. This design ensures that adaptation to regime shifts is possible without increased vulnerability to input noise.

Practically, the method is immediately applicable as a regularization module for any forecasting backbone without architecture modifications. This suggests straightforward adoption in production systems where test-time anomalies are irregular and the anomaly type (irrelevant versus relevant) cannot be known a priori.

The approach is orthogonal to architectural innovations in time-series forecasting and may, in combination with foundation models (e.g., large time-series transformers), yield further robustness improvements. The explicit modeling of input-output relevance distinction is also amenable to extension toward multi-step anomaly anticipation or probabilistic forecasting under non-stationarity.

Future Directions

This framework opens several lines of future inquiry:

  • Extension to multivariate and cross-domain regime shifts: Further generalization where forecast relevance is output-channel/time-specific.
  • Contrastive regularization in probabilistic and uncertainty-aware forecasting heads: Adapting latent-output alignment to full predictive distributions rather than point estimates.
  • Online and continual anomaly-aware learning: Adapting Co-TSFA to settings where new anomaly types or regimes are encountered incrementally.
  • Formal sample complexity analysis: Theoretical characterization of the statistical efficiency of alignment regularization in presence of mixed anomaly types.

Conclusion

Co-TSFA presents a technically principled and empirically validated approach for improving robustness and adaptability of time-series forecasting models under diverse test-time anomaly scenarios. It leverages contrastive latent-output alignment to enforce that only forecast-relevant deviations in the input result in representational and predictive changes, producing substantial generalization improvements under challenging distributional shifts without sacrificing clean accuracy. This methodology delineates the path toward more reliable deployment of neural forecasting systems in non-stationary and anomaly-prone environments.

Whiteboard

Paper to Video (Beta)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 6 likes about this paper.