DeepEdgeIDS: Unsupervised DRL for IoT Edge Security
- DeepEdgeIDS is an unsupervised deep reinforcement learning IDS that combines an autoencoder for anomaly detection with a deep Q-network for dynamic DDoS mitigation on IoT edge gateways.
- It employs a multi-objective reward function that balances detection accuracy, response latency, energy consumption, and carbon emissions to rapidly adapt to novel attacks.
- Experimental results on edge hardware show superior detection rates, reduced latency, and explicit carbon-aware operation compared to supervised DRL alternatives.
DeepEdgeIDS is an unsupervised Deep Reinforcement Learning (DRL)-based Intrusion Detection System (IDS) designed for sustainable and adaptive Distributed Denial-of-Service (DDoS) mitigation in Internet of Things (IoT) edge gateways. Combining an Autoencoder (AE) for anomaly detection with a Deep Q-Network (DQN) for real-time, carbon-aware mitigation policy selection, DeepEdgeIDS addresses the limitations of traditional IDS—including poor adaptability to zero-day threats, dependence on static signatures or labels, and neglect of sustainability metrics such as energy consumption and carbon footprint. Experimental results demonstrate that DeepEdgeIDS attains higher detection accuracy, faster adaptation to novel attacks, and explicit carbon-aware operation when compared to supervised DRL alternatives (Jamshidi et al., 23 Nov 2025).
1. System Architecture and Operational Pipeline
DeepEdgeIDS processes raw network traffic at the edge gateway by employing an unsupervised AE to encode each traffic flow into a latent vector using
The AE reconstruction loss, given by , serves as an anomaly score . Flows with (where is set at the 95th percentile of scores on benign validation traffic) are flagged as suspicious. These flows are subsequently processed by the DQN policy network, whose input state is
The DQN provides action values over four discrete mitigation actions:
- = rate limit,
- = SYN-throttling,
- = drop packets,
- = blacklist IPs.
Selected actions are enforced at the gateway via dynamic rule updates (e.g., iptables). Experience replay (buffer size: 50,000; batch size: 64) and -greedy exploration strategies promote robust and stable off-policy learning.
2. Autoencoder-based Anomaly Detection
The AE component is optimized to minimize reconstruction loss:
During live operation, is computed for each traffic flow. An event is deemed anomalous (i.e., DDoS suspected) if . This unsupervised anomaly detection scheme enables robust identification of zero-day patterns and non-stationary attack behaviors without dependence on labeled data.
3. Markov Decision Process Formulation and Reward Design
The core DRL mechanism of DeepEdgeIDS is modeled as a Markov Decision Process (MDP) where states, actions, and a multi-objective reward function are jointly optimized:
- State space: .
- Action space: Four discrete edge-mitigating responses.
The multi-objective reward function is defined as:
where:
- / : detection and false positive rates,
- : response latency,
- : energy consumed,
- : memory utilization,
- : carbon footprint, with as local grid carbon intensity.
This formulation ensures continuous adaptation toward high accuracy, low resource usage, minimal latency, and explicit minimization of carbon impact at the IoT edge.
4. DRL Algorithmic Structure and Network Implementation
DeepEdgeIDS employs a DQN policy:
with hyperparameters:
- Discount ,
- Learning rate (Adam optimizer),
- Exploration annealed from 1.0 to 0.01,
- Experience replay buffer: 50,000 transitions; batch size: 64,
- Convergence typically achieved after steps.
The network architecture comprises:
- Input: dimension matching (≈8 Bot-IoT features + latent dimension),
- Two fully connected hidden layers (e.g., 128 units, ReLU),
- Output: linear Q-values.
5. Experimental Setup and Evaluation Methodology
Experiments were conducted on edge hardware (Raspberry Pi 4, 1.5 GHz quad-core, 8 GB RAM) with ESP32 IoT sensor nodes. The Bot-IoT dataset, originally 80 features, was reduced to 8 via variance analysis, Pearson correlation (0.85), and recursive elimination (retaining 95% predictive power). The threshold was set using the AE reconstruction error distribution on benign validation traffic.
Performance metrics and resource utilization were evaluated in both training and live deployment regimes. Statistical significance was assessed via ANOVA, highlighting differences in latency (F=75.62, p<0.05, =0.577) and detection probability (F=67.89, p<0.05).
6. Quantitative Results
DeepEdgeIDS achieves superior metrics relative to supervised DRL alternatives:
| Metric | DeepEdgeIDS | AutoDRL-IDS |
|---|---|---|
| Accuracy | 98.0% | 94.0% |
| Precision | 92.4% | — |
| Recall | 97.6% | — |
| F1-score | 94.9% | — |
| Live accuracy | 97.0% | — |
| Live F1-score | 93.0% | — |
| Missed packets/hr | 2,640 | 12,740 |
| Peak inference latency | 9.2 ms | — |
| Response under DDoS | 0.65 s | 0.50 s |
| CPU usage (normal) | 12.1% | — |
| CPU usage (attack) | 39.6% | — |
| Peak energy (attack) | 138.4 J | — |
Under sustained DDoS, DeepEdgeIDS maintains detection probability of 97.6% versus 92.0% for AutoDRL-IDS. The trade-off is a ~20% higher peak CPU workload and approximately 8 J extra energy usage under peak attack, with carbon intensity computed from local grid data.
7. Comparative Adaptability and Sustainability Guarantees
DeepEdgeIDS’s unsupervised AE–DQN hybrid design enables faster adaptation to zero-day attacks through direct encoding of anomaly scores and latent representations into the DRL policy. Bellman contraction and Lyapunov stability provide robust policy convergence under nonstationary traffic patterns. These properties yield significantly lower missed detection rates and faster adaptation relative to supervised DRL approaches.
The system’s explicit carbon-aware reward structure ( term) and multi-objective Lagrangian optimization enforce a Pareto-optimal balance among detection rate, energy usage, and carbon emissions. DeepEdgeIDS achieves 40% lower response latency (p<0.05) at the cost of higher computational effort, but with demonstrably bounded carbon and energy footprints, establishing it as a sustainable solution for real-time IoT edge IDS deployment (Jamshidi et al., 23 Nov 2025).