LSTM-Autoencoder Anomaly Detection for DDoS

Updated 29 November 2025

The paper introduces a novel LSTM autoencoder architecture that identifies distributed DDoS anomalies with high precision using reconstruction error thresholding.
The methodology converts network flow records into sliding windows processed by LSTM layers to effectively capture temporal dependencies.
Rigorous evaluations on the CICDDoS2019 dataset demonstrate superior detection metrics compared to classical machine learning models, ensuring real-time, scalable deployment.

Long Short-Term Memory (LSTM) autoencoder-based distributed anomaly detection is an advanced technique for identifying Distributed Denial-of-Service (DDoS) attack anomalies in network traffic, particularly effective against unseen, reflection-based attacks. By leveraging the temporal modeling capacities of LSTM networks and the representation learning capabilities of autoencoders, the approach is tailored to multivariate time-series data aggregated from network flow records. In this context, the LSTM-Autoencoder (LSTM-AE) architecture is combined with a thresholding strategy on reconstruction error to achieve high accuracy and robust deployment in large-scale, real-time, distributed environments (Wei et al., 2023).

1. LSTM-Autoencoder Network Architecture

The detection architecture processes input data as sliding windows of $t$ flow records, each represented by an $m$ -dimensional numerical feature vector. In the deployed model, $m=5$ , corresponding to the features “Max Packet Length,” “Fwd Packet Length Max,” “Fwd Packet Length Min,” “Average Packet Size,” and “Min Packet Length.” Window size $t$ is tunable, with experiments using values such as 10, 50, or 100 milliseconds.

Processing proceeds as follows:

Input Preparation: Raw flow data, extracted from packet capture (pcap) files to CSV, are reshaped to arrays of shape $(n_\text{samples}, t, m)$ .
Encoder: A single LSTM layer with 16 hidden units (tanh activation, dropout 0.2) encodes the window sequence, producing a bottleneck vector of shape $1 \times 16$ from the final LSTM state.
Repeat Vector: This low-dimensional code is repeated $t$ times to match the time window structure.
Decoder: A second LSTM layer (also 16 units, tanh, dropout 0.2) reconstructs the sequence, producing $(t,16)$ outputs with return_sequences enabled.
TimeDistributed Dense Layer: Each decoded vector is mapped back to $m=5$ dimensions, yielding the predicted reconstruction $\hat{X}$ with shape $(t, m)$ , aiming to match the original input $X$ .

The full architecture is optimized for compactness, low latency, and effective feature representation.

2. LSTM Cell Dynamics and Governing Equations

Both the encoder and decoder LSTM layers implement the standard gating mechanism at each time step $t$ , using the following update equations:

$\begin{aligned} i_t &= \sigma(W_i x_t + U_i h_{t–1} + b_i) \ f_t &= \sigma(W_f x_t + U_f h_{t–1} + b_f) \ o_t &= \sigma(W_o x_t + U_o h_{t–1} + b_o) \ \tilde{c}_t &= \tanh(W_c x_t + U_c h_{t–1} + b_c) \ c_t &= f_t \odot c_{t–1} + i_t \odot \tilde{c}_t \ h_t &= o_t \odot \tanh(c_t) \end{aligned}$

Here, $\sigma(\cdot)$ denotes the logistic sigmoid function, $\odot$ represents element-wise multiplication, $x_t \in \mathbb{R}^m$ , $h_t \in \mathbb{R}^{16}$ , $c_t \in \mathbb{R}^{16}$ . Both encoder and decoder layers identically apply these updates to capture temporal dependencies in the input sequence.

3. Loss Function and Anomaly Thresholding

The model is trained as a reconstruction autoencoder, aiming to minimize the discrepancy between input and output windows. The objective function is the mean absolute error (MAE):

$L_\text{MAE} = \frac{1}{N} \sum_{i=1}^N \| x^{(i)} - \hat{x}^{(i)} \|_1$

where $x^{(i)}$ and $\hat{x}^{(i)}$ are the true and reconstructed window vectors, respectively. Mean squared error (MSE) is an alternative, but experiments employ MAE.

Threshold Selection: During training, only benign (non-attack) windows are used. Each window’s reconstruction error is computed, and the anomaly detection threshold is set as the maximum error observed on training data:

$\text{threshold} = \max\{\text{error on training set}\}$

At inference, any window with reconstruction loss exceeding this threshold is identified as anomalous. An alternate statistical rule— $\mu + k \sigma$ —may be used, but is not adopted in the referenced paper.

4. Model Training, Evaluation, and Dataset Strategy

Key experimental settings include:

Dataset: CICDDoS2019, comprising reflection-based DDoS attack flows (DNS, LDAP, SNMP) and benign backgrounds.
Feature engineering: $m=5$ selected network flow statistics per window.
Temporal aggregation: $t$ in $\{10, 50, 100\}$ milliseconds.
Splits: 70% benign for training, 10% benign for validation, 20% mixed for testing (benign + all attacks).
Optimization: Adam optimizer, learning rate 0.001, batch size 64, 30 epochs.
Regularization: Dropout rate 0.2 on both encoder and decoder LSTM layers.
Activation: tanh for LSTMs, linear in the final output layer.
Inference workflow: Online, streaming application using a circular buffer of most-recent $t$ records, per-window normalization, and forward pass through the LSTM-AE.

5. Quantitative Detection Performance

Performance benchmarks, as obtained on sliding windows of $t = 10$ ms, demonstrate the efficacy of the LSTM-AE method in discriminating reflection-based DDoS attacks:

Attack Type	Accuracy (%)	Precision (%)	Recall (%)	F1-score (%)	AUC-ROC (%)
DNS	96.08	99.99	94.30	97.06	~97.1
LDAP	99.96	99.99	99.93	99.96	~99.96
SNMP	96.89	99.99	95.49	97.69	~97.75

These results surpass classical machine learning baselines operating on the same features (e.g., F1 ≈ 0.62 for RandomForest, F1 ≈ 0.69 for ID3) and are competitive with or superior to contemporary deep models (Wei et al., 2023).

6. Real-Time Distributed Deployment Strategies

For large-scale network protection, LSTM-AE detectors are deployed as lightweight services or containers at edge routers or network taps. Each instance operates autonomously:

Inference latency: Per-window forward pass scaling as $O(t \cdot m \cdot \text{hidden units})$ ; with $m=5$ , hidden units=16, real-time processing is practical.
Threshold adaptation: Each node dynamically tracks the mean and variance of recent benign error distributions, updating thresholds via exponential averaging to accommodate non-stationary traffic. Nodes may exchange statistics over gossip protocols to ensure global calibration.
Alerting: Anomalous windows are flagged locally; metadata (timestamp, source/destination, error magnitude) are transmitted to centralized command and control (C&C) or SIEM systems.
Fail-over: Nodes detecting potential model degradation can autonomously retrieve fresh checkpoints from model registries.
Scalability: Thousands of monitoring agents can be deployed in parallel, maintaining local sensitivity and enabling cross-network anomaly correlation.

7. Impact and Significance in Network Security

LSTM autoencoder-based distributed anomaly detection provides an operational method for learning multivariate temporal correlations characteristic of benign traffic, thereby confining normal behavior within a tightly bounded reconstruction error. This scheme yields robust, sub-millisecond detection of novel and known DDoS attack variants with consistently high accuracy, minimal false positives, and resistance to concept drift through continual, localized threshold recalibration. In practical scenarios, the model is trained offline on benign logs, deployed in a resource-efficient footprint to edge sensors, and executed in streaming, sliding-window mode—producing actionable, distributed anomaly alerts that are readily aggregated into a unified security monitoring framework (Wei et al., 2023).

PDF Markdown Chat (Pro)

References (1)

Reconstruction-based LSTM-Autoencoder for Anomaly-based DDoS Attack Detection over Multivariate Time-Series Data (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to LSTM Autoencoder-Based Distributed Anomaly Detection.