xLSTMAD: A Powerful xLSTM-based Method for Anomaly Detection

Published 28 Jun 2025 in cs.LG and cs.AI | (2506.22837v1)

Abstract: The recently proposed xLSTM is a powerful model that leverages expressive multiplicative gating and residual connections, providing the temporal capacity needed for long-horizon forecasting and representation learning. This architecture has demonstrated success in time series forecasting, lossless compression, and even large-scale language modeling tasks, where its linear memory footprint and fast inference make it a viable alternative to Transformers. Despite its growing popularity, no prior work has explored xLSTM for anomaly detection. In this work, we fill this gap by proposing xLSTMAD, the first anomaly detection method that integrates a full encoder-decoder xLSTM architecture, purpose-built for multivariate time series data. Our encoder processes input sequences to capture historical context, while the decoder is devised in two separate variants of the method. In the forecasting approach, the decoder iteratively generates forecasted future values xLSTMAD-F, while the reconstruction approach reconstructs the input time series from its encoded counterpart xLSTMAD-R. We investigate the performance of two loss functions: Mean Squared Error (MSE), and Soft Dynamic Time Warping (SoftDTW) to consider local reconstruction fidelity and global sequence alignment, respectively. We evaluate our method on the comprehensive TSB-AD-M benchmark, which spans 17 real-world datasets, using state-of-the-art challenging metrics such as VUS-PR. In our results, xLSTM showcases state-of-the-art accuracy, outperforming 23 popular anomaly detection baselines. Our paper is the first work revealing the powerful modeling capabilities of xLSTM for anomaly detection, paving the way for exciting new developments on this subject. Our code is available at: https://github.com/Nyderx/xlstmad

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces xLSTMAD, a novel encoder-decoder xLSTM architecture that sets a new benchmark for multivariate time series anomaly detection.
It employs dual strategies—forecasting and reconstruction—with MSE and SoftDTW losses, enhancing robustness and handling temporal misalignments effectively.
Empirical tests across 17 datasets reveal xLSTMAD outperforms 23 baselines, highlighting its scalability and potential for real-world industrial applications.

xLSTMAD: A Powerful xLSTM-based Method for Anomaly Detection

The paper introduces xLSTMAD, an encoder-decoder architecture based entirely on xLSTM blocks, designed for multivariate time series anomaly detection. This work is the first to systematically explore the application of xLSTM—a recent generalization of LSTM with enhanced memory and parallelization properties—to the anomaly detection domain. The authors present two variants: a forecasting-based approach (xLSTMAD-F) and a reconstruction-based approach (xLSTMAD-R), each evaluated with both Mean Squared Error (MSE) and Soft Dynamic Time Warping (SoftDTW) losses. The method is rigorously benchmarked on the TSB-AD-M suite, comprising 17 real-world datasets, and demonstrates superior performance over 23 established baselines.

Technical Contributions

The main technical contributions are as follows:

Encoder-Decoder xLSTM Architecture: The model leverages xLSTM blocks, which combine depthwise convolutions, residual connections, and multi-scale gating. The encoder processes input sequences to capture historical context, while the decoder is used for either forecasting or reconstruction.
Hybrid Stacking of mLSTM and sLSTM: The architecture integrates both matrix-valued (mLSTM) and scalar (sLSTM) memory cells, balancing expressiveness and stability. Residual connections and selective gating/feedforward components further enhance model capacity.
Dual Anomaly Detection Strategies: xLSTMAD-F predicts future values and uses prediction error as the anomaly score, while xLSTMAD-R reconstructs the input and uses reconstruction error. This duality allows adaptation to different anomaly detection scenarios.
Loss Function Exploration: Both MSE and SoftDTW losses are evaluated. SoftDTW, in particular, enables the model to be robust to temporal misalignments, which is critical in real-world time series.
Comprehensive Benchmarking: The method is evaluated on the TSB-AD-M benchmark, using challenging metrics such as VUS-PR and VUS-ROC, which are robust to window size and anomaly cardinality.

Empirical Results

The empirical evaluation is extensive, with 720 training and test executions across 17 datasets. Key findings include:

State-of-the-Art Performance: xLSTMAD-R (MSE) achieves a VUS-PR of 0.37, outperforming the best baseline (CNN) by approximately 20% and the random model by 370%. xLSTMAD-F (MSE) achieves the highest AUC-PR (0.35), AUC-ROC (0.74), and VUS-ROC (0.77).
Robustness Across Domains: The model consistently outperforms LSTM and CNN baselines on industrial, physiological, and human activity datasets. For example, on the Daphnet dataset, xLSTMAD-R (MSE) achieves a VUS-PR of 0.50, compared to 0.31 for LSTM and 0.21 for CNN.
Complementary Strengths: The reconstruction and forecasting variants show complementary strengths, with xLSTMAD-R dominating in VUS-PR and xLSTMAD-F excelling in VUS-ROC.
Loss Function Impact: While MSE generally yields the best results, SoftDTW provides competitive performance, particularly in scenarios with temporal misalignments.

Architectural and Implementation Details

The xLSTMAD architecture is characterized by:

Input Projection: Input sequences are projected into a $D$ -dimensional embedding space using a linear layer followed by GELU activation.
Encoder: Stacked xLSTM blocks, each comprising an mLSTM (with Conv1D) and optionally an sLSTM, with residual connections.
Decoder: Initialized from the encoder's final state, the decoder generates either forecasts (xLSTMAD-F) or reconstructions (xLSTMAD-R) in an autoregressive manner.
Losses: MSE is used for pointwise fidelity, while SoftDTW is used for phase-aware, shape-consistent alignment.

A high-level pseudocode for the training loop is as follows:

for batch in dataloader:
    X = batch['input']  # shape: [B, W, F]
    if mode == 'forecast':
        y_true = batch['future']  # [B, p, F]
        y_pred = model.forecast(X)
        loss = mse_loss(y_pred, y_true) if loss_type == 'mse' else soft_dtw_loss(y_pred, y_true)
    else:  # reconstruction
        y_true = X
        y_pred = model.reconstruct(X)
        loss = mse_loss(y_pred, y_true) if loss_type == 'mse' else soft_dtw_loss(y_pred, y_true)
    loss.backward()
    optimizer.step()

Implications and Future Directions

Practical Implications:

Scalability: The parallelizable design of xLSTM, especially the mLSTM variant, enables efficient training and inference on long sequences and high-frequency data streams, making it suitable for real-time anomaly detection in industrial and IoT settings.
Flexibility: The dual approach (forecasting and reconstruction) allows practitioners to tailor the method to the anomaly characteristics of their domain.
Robustness: The use of SoftDTW as a loss function provides robustness to phase shifts and temporal distortions, which are common in sensor and physiological data.

Theoretical Implications:

The results challenge the prevailing assumption that more complex architectures (e.g., Transformers) are always superior for time series anomaly detection. The xLSTMAD results suggest that expressive recurrence, when properly designed, can outperform attention-based models in this domain.
The integration of matrix-valued memory and exponential gating in xLSTM provides a new direction for sequence modeling, particularly for tasks requiring long-term dependency tracking and efficient memory usage.

Future Developments:

Model Compression and Efficiency: Further work could explore pruning, quantization, or distillation of xLSTMAD for deployment on edge devices.
Hybrid Architectures: Combining xLSTM with attention mechanisms or graph-based modules may further enhance anomaly detection in complex, structured time series.
Explainability: Investigating the interpretability of xLSTMAD's anomaly scores and memory states could provide actionable insights in critical applications such as industrial monitoring and healthcare.

Conclusion

xLSTMAD establishes a new state-of-the-art for multivariate time series anomaly detection, demonstrating that advanced recurrent architectures with efficient memory and gating mechanisms can outperform both classical and Transformer-based baselines. The dual approach, robust loss functions, and comprehensive benchmarking provide a strong foundation for future research and practical deployment in diverse anomaly detection scenarios.

Markdown Report Issue