Papers
Topics
Authors
Recent
2000 character limit reached

DeepEdgeIDS: Unsupervised DRL for IoT Edge Security

Updated 30 November 2025
  • DeepEdgeIDS is an unsupervised deep reinforcement learning IDS that combines an autoencoder for anomaly detection with a deep Q-network for dynamic DDoS mitigation on IoT edge gateways.
  • It employs a multi-objective reward function that balances detection accuracy, response latency, energy consumption, and carbon emissions to rapidly adapt to novel attacks.
  • Experimental results on edge hardware show superior detection rates, reduced latency, and explicit carbon-aware operation compared to supervised DRL alternatives.

DeepEdgeIDS is an unsupervised Deep Reinforcement Learning (DRL)-based Intrusion Detection System (IDS) designed for sustainable and adaptive Distributed Denial-of-Service (DDoS) mitigation in Internet of Things (IoT) edge gateways. Combining an Autoencoder (AE) for anomaly detection with a Deep Q-Network (DQN) for real-time, carbon-aware mitigation policy selection, DeepEdgeIDS addresses the limitations of traditional IDS—including poor adaptability to zero-day threats, dependence on static signatures or labels, and neglect of sustainability metrics such as energy consumption and carbon footprint. Experimental results demonstrate that DeepEdgeIDS attains higher detection accuracy, faster adaptation to novel attacks, and explicit carbon-aware operation when compared to supervised DRL alternatives (Jamshidi et al., 23 Nov 2025).

1. System Architecture and Operational Pipeline

DeepEdgeIDS processes raw network traffic at the edge gateway by employing an unsupervised AE to encode each traffic flow xtx_t into a latent vector hth_t using

ht=fencoder(xt),x^t=fdecoder(ht).h_t = f^{\text{encoder}}(x_t), \quad \hat{x}_t = f^{\text{decoder}}(h_t).

The AE reconstruction loss, given by LAE=xtx^t2L_{AE} = \|x_t - \hat{x}_t\|^2, serves as an anomaly score As=xtx^t2A_s = \|x_t - \hat{x}_t\|^2. Flows with As>τaA_s > \tau_a (where τa\tau_a is set at the 95th percentile of scores on benign validation traffic) are flagged as suspicious. These flows are subsequently processed by the DQN policy network, whose input state is

st={Prate,SYNcount,ACKcount,As,ht}.s_t = \{\text{P}_{\text{rate}}, \text{SYN}_{\text{count}}, \text{ACK}_{\text{count}}, A_s, h_t\}.

The DQN provides action values Q(st,a)Q(s_t,a) over four discrete mitigation actions:

  • a1a_1 = rate limit,
  • a2a_2 = SYN-throttling,
  • a3a_3 = drop packets,
  • a4a_4 = blacklist IPs.

Selected actions are enforced at the gateway via dynamic rule updates (e.g., iptables). Experience replay (buffer size: 50,000; batch size: 64) and ϵ\epsilon-greedy exploration strategies promote robust and stable off-policy learning.

2. Autoencoder-based Anomaly Detection

The AE component is optimized to minimize reconstruction loss:

LAE=xtx^t2.L_{AE} = \|x_t - \hat{x}_t\|^2.

During live operation, AsA_s is computed for each traffic flow. An event is deemed anomalous (i.e., DDoS suspected) if As>τaA_s > \tau_a. This unsupervised anomaly detection scheme enables robust identification of zero-day patterns and non-stationary attack behaviors without dependence on labeled data.

3. Markov Decision Process Formulation and Reward Design

The core DRL mechanism of DeepEdgeIDS is modeled as a Markov Decision Process (MDP) where states, actions, and a multi-objective reward function are jointly optimized:

  • State space: st={Prate,SYNcount,ACKcount,As,ht}s_t = \{\text{P}_{\text{rate}}, \text{SYN}_{\text{count}}, \text{ACK}_{\text{count}}, A_s, h_t\}.
  • Action space: Four discrete edge-mitigating responses.

The multi-objective reward function is defined as:

R(st,at)=αDRtβFPRtλLLresp,tδEoverhead,tϵMutil,tζCemission,t,R(s_t, a_t) = \alpha \cdot DR_t - \beta \cdot FPR_t - \lambda_L \cdot L_{\text{resp}, t} - \delta \cdot E_{\text{overhead}, t} - \epsilon \cdot M_{\text{util}, t} - \zeta \cdot C_{\text{emission}, t},

where:

  • DRtDR_t / FPRtFPR_t: detection and false positive rates,
  • Lresp,tL_{\text{resp}, t}: response latency,
  • Eoverhead,t=PtΔtE_{\text{overhead}, t} = P_t \Delta t: energy consumed,
  • Mutil,t=Mactive/MtotalM_{\text{util}, t} = M_{\text{active}} / M_{\text{total}}: memory utilization,
  • Cemission,tκtEoverhead,tC_{\text{emission}, t} \approx \kappa_t E_{\text{overhead}, t}: carbon footprint, with κt\kappa_t as local grid carbon intensity.

This formulation ensures continuous adaptation toward high accuracy, low resource usage, minimal latency, and explicit minimization of carbon impact at the IoT edge.

4. DRL Algorithmic Structure and Network Implementation

DeepEdgeIDS employs a DQN policy:

Q(st,at)R(st,at)+γmaxaQ(st+1,a),Q(s_t, a_t) \leftarrow R(s_t, a_t) + \gamma \max_{a'} Q(s_{t+1}, a'),

with hyperparameters:

  • Discount γ=0.99\gamma = 0.99,
  • Learning rate η1×103\eta \approx 1 \times 10^{-3} (Adam optimizer),
  • Exploration ϵ\epsilon annealed from 1.0 to 0.01,
  • Experience replay buffer: 50,000 transitions; batch size: 64,
  • Convergence typically achieved after 2.5×1052.5 \times 10^5 steps.

The network architecture comprises:

  • Input: dimension matching sts_t (≈8 Bot-IoT features + latent dimension),
  • Two fully connected hidden layers (e.g., 128 units, ReLU),
  • Output: A=4|A| = 4 linear Q-values.

5. Experimental Setup and Evaluation Methodology

Experiments were conducted on edge hardware (Raspberry Pi 4, 1.5 GHz quad-core, 8 GB RAM) with ESP32 IoT sensor nodes. The Bot-IoT dataset, originally 80 features, was reduced to 8 via variance analysis, Pearson correlation (0.85), and recursive elimination (retaining 95% predictive power). The threshold τa\tau_a was set using the AE reconstruction error distribution on benign validation traffic.

Performance metrics and resource utilization were evaluated in both training and live deployment regimes. Statistical significance was assessed via ANOVA, highlighting differences in latency (F=75.62, p<0.05, η2\eta^2=0.577) and detection probability (F=67.89, p<0.05).

6. Quantitative Results

DeepEdgeIDS achieves superior metrics relative to supervised DRL alternatives:

Metric DeepEdgeIDS AutoDRL-IDS
Accuracy 98.0% 94.0%
Precision 92.4%
Recall 97.6%
F1-score 94.9%
Live accuracy 97.0%
Live F1-score 93.0%
Missed packets/hr 2,640 12,740
Peak inference latency 9.2 ms
Response under DDoS 0.65 s 0.50 s
CPU usage (normal) 12.1%
CPU usage (attack) 39.6%
Peak energy (attack) 138.4 J

Under sustained DDoS, DeepEdgeIDS maintains detection probability of 97.6% versus 92.0% for AutoDRL-IDS. The trade-off is a ~20% higher peak CPU workload and approximately 8 J extra energy usage under peak attack, with carbon intensity Ct=κtEtC_t = \kappa_t E_t computed from local grid data.

7. Comparative Adaptability and Sustainability Guarantees

DeepEdgeIDS’s unsupervised AE–DQN hybrid design enables faster adaptation to zero-day attacks through direct encoding of anomaly scores and latent representations into the DRL policy. Bellman contraction and Lyapunov stability provide robust policy convergence under nonstationary traffic patterns. These properties yield significantly lower missed detection rates and faster adaptation relative to supervised DRL approaches.

The system’s explicit carbon-aware reward structure (ζCt-\zeta \cdot C_t term) and multi-objective Lagrangian optimization enforce a Pareto-optimal balance among detection rate, energy usage, and carbon emissions. DeepEdgeIDS achieves 40% lower response latency (p<0.05) at the cost of higher computational effort, but with demonstrably bounded carbon and energy footprints, establishing it as a sustainable solution for real-time IoT edge IDS deployment (Jamshidi et al., 23 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to DeepEdgeIDS.