AutoDRL-IDS: Carbon-Aware DRL Intrusion Detection
- AutoDRL-IDS is a DRL-based intrusion detection system that uses LSTM sequence encoding and policy learning for adaptive, real-time cyberattack detection.
- It combines supervised pretraining with techniques like DQN to achieve high detection accuracy and low false alarm rates in resource-constrained edge deployments.
- The system employs a multi-objective reward design that balances traditional detection metrics with energy efficiency and reduced carbon emissions for sustainable operation.
AutoDRL-IDS denotes a class of Deep Reinforcement Learning-Intrusion Detection Systems that utilize supervised learning and sequence modeling (typically via LSTM encoders) for efficient, adaptive cyberattack detection under real-world constraints, notably in settings such as IoT edge gateways and cyber-physical infrastructures. The defining feature is the integration of temporally-aware feature encoding with policy learning, frequently enhanced by multi-objective optimization criteria that explicitly balance classical detection accuracy and sustainability metrics such as energy consumption and carbon emissions (Jamshidi et al., 23 Nov 2025, Al-Mehdhar et al., 2024).
1. Core Model Architecture
AutoDRL-IDS is fundamentally structured as a two-stage system: a feature-sequence encoder (typically a unidirectional LSTM) followed by a DRL agent, which is commonly instantiated as a Deep Q-Network (DQN) or actor-critic method. The canonical LSTM encoder processes -dimensional traffic features , propagating its hidden state through cell dynamics:
where is the sigmoid activation and denotes element-wise multiplication. is used as an embedding for both the supervised classifier and as a summary input to the DRL agent.
The DQN policy receives a state vector encompassing selected raw traffic metrics (e.g., packet rate, SYN/ACK counts), the LSTM hidden state , and optionally an anomaly score derived from an auxiliary autoencoder. The action space typically includes discrete mitigation strategies (e.g., rate-limiting, packet dropping, IP blacklisting), suitable for real-time control on edge platforms.
2. Supervised Pretraining and DRL Integration
Supervised pretraining of the LSTM encoder is conducted on labeled datasets (e.g., Bot-IoT), optimizing a binary cross-entropy loss: where . After convergence, LSTM weights are either frozen or lightly fine-tuned during DRL policy learning.
At runtime, the feature extraction and encoding are executed online, and the DQN agent selects the optimal action given the augmented state for each time step by maximizing . Bellman updates are applied over experience replay buffers, with target networks for stability:
Experiments typically use Adam optimizer, batch sizes of 64, and replay buffers up to 50,000 samples.
3. Carbon-Aware and Multi-Objective Reward Design
AutoDRL-IDS incorporates a multi-objective reward function to enforce trade-offs between detection efficacy and sustainability. The reward at time is: with (detection reward), (false positives), (latency), (energy cost), (memory utilization ratio), and (carbon emission) defined as:
- (energy, J)
- , (carbon intensity, gCO per J)
A simplified formulation used in practice is: This structure allows practitioners to tune policy sensitivity according to security and sustainability constraints of the deployment context (Jamshidi et al., 23 Nov 2025).
4. Application Domains and Deployment Results
AutoDRL-IDS implementations have demonstrated efficacy across domains. In EV charging station cyber-defense, the system couples a DRL adversary (LSTM or Transformer policy gradient agent) that synthesizes stealthy state-of-charge (SoC) attacks with a robust PPO-based IDS. A hierarchical adversarial training architecture is used: the attack generator produces adversarial datasets by maximizing its own utility against a fixed baseline IDS, while the IDS is trained to minimize detection loss on these challenging samples, yielding a saddle-point optimization. This scheme achieved near-perfect detector generalization: in real-world traces (536 taxis, 24 days), Transformer-based IDS models maintained accuracy and false-alarm rates, including under attacks not seen in training (Al-Mehdhar et al., 2024).
On Raspberry Pi 4/ESP32 gateways for IoT edge DDoS detection, supervised LSTM-DQN AutoDRL-IDS reached accuracy, precision, recall, and F1, with a mean response time of $0.50$ s and significant reductions in energy (20–30\%) and carbon (10–15\%) versus traditional RL-IDS, at $0.03$ kgCO per five-minute detection window (Jamshidi et al., 23 Nov 2025).
| Domain | Architecture | Accuracy | F1 | Notable Metrics |
|---|---|---|---|---|
| EV Charging Cyberdefense | Transformer–PPO2 | 0.999 | 0.999 | 0.5\% FAR, Unseen attack robust |
| IoT Edge Gateway | LSTM–DQN | 0.94 | 0.913 | 0.03 kgCO/5 min, 0.5s latency |
5. Comparative Analysis and Limitations
Relative to unsupervised DRL-IDS (e.g., DeepEdgeIDS), the supervised AutoDRL-IDS variant provides highest-precision detection for attack types seen in labeled datasets. The LSTM-based temporal encoding enhances the ability to recognize evolving traffic patterns and supports smoother DRL policy updates. The introduction of carbon-aware rewards represents a distinct advance by explicitly incentivizing sustainability in operational deployments, crucial for edge platforms with limited energy budgets.
Limitations include reduced adaptability to zero-day attacks compared to unsupervised or self-supervised hybrids and reliance on labeled data, which may hinder transferability to new environments. Periodic DRL updates can incur resource overhead even in static regimes—a plausible implication is the benefit of event-triggered or adaptive DRL retraining strategies. Model compression (pruning, quantization) and federated DRL distribution are suggested avenues for future work to mitigate resource and scalability constraints (Jamshidi et al., 23 Nov 2025).
6. Implementation and Replication Details
Experimental configurations adopt standard open-source toolchains. LSTM pretraining uses Adam optimizer (, batch size , epochs 10–20). DRL training leverages DQN agents with -greedy exploration (annealed from 1.0 to 0.1), discount factor , replay buffer up to 50k samples, and periodic target network updates. For adversarial EV-charging scenarios, training uses adversary and IDS episodes, time slots per episode; IDS train using SGD with learning rate (best at ), and PPO clipping (Al-Mehdhar et al., 2024).
Synthetic dataset balancing (e.g., via ADASYN) is applied to ensure class parity. Hardware evaluations on Raspberry Pi 4 and ESP32 confirm real-time operation under resource and energy constraints for edge IoT use cases, with CPU utilization and memory (Jamshidi et al., 23 Nov 2025).
7. Broader Impact and Future Directions
AutoDRL-IDS extends the practical frontier of DRL for cyber-physical anomaly detection by jointly optimizing for detection quality and environmental sustainability—addressing a growing need in resource-constrained IoT infrastructures and cyber-physical systems. Carbon-aware objectives, lightweight LSTM encoding, and DQN policy architectures make it a candidate for broad adoption in green security frameworks.
Potential enhancements include incorporation of meta-learning or semi-supervised DRL to elevate adaptivity to evolving or unseen threats, federated training schemes for distributed IoT deployments, and rigorous model compression for extreme edge/embedded platforms. A plausible implication is that advances in reward shaping and multi-agent adversarial training will strengthen robustness and generalization of IDS policies against increasingly sophisticated attack strategies.
References:
- "Charging Ahead: A Hierarchical Adversarial Framework for Counteracting Advanced Cyber Threats in EV Charging Stations" (Al-Mehdhar et al., 2024)
- "Carbon-Aware Intrusion Detection: A Comparative Study of Supervised and Unsupervised DRL for Sustainable IoT Edge Gateways" (Jamshidi et al., 23 Nov 2025)