Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Hybrid CNN-LSTM Intrusion Detection Framework for Cybersecurity in Smart Renewable Energy Grids

Published 23 Jun 2026 in cs.LG and cs.AI | (2606.25200v1)

Abstract: The accelerated digitalization of renewable energy smart grids through IoT sensors, AMI, and SCADA systems has significantly expanded the attack surface for sophisticated cyberattacks, FDI attacks that stealthily distort state estimation and DoS/DDoS attacks that flood communication channels. Current IDS, however, exhibit three inherent limitations: inadequate modeling of the temporal progression of multi-step attacks, degraded scalability under extremely skewed class distributions of standard benchmark datasets, and restricted generalization across heterogeneous network environments. In this study, we present a Hybrid CNN-LSTM IDS that jointly exploits CNN-based spatial feature extraction and LSTM-based temporal sequence modeling, enabling the detection of instantaneous volumetric anomalies and gradually evolving low and slow-attack campaigns in real time. The model was trained using a seven-step preprocessing workflow comprising missing-value imputation, min-max normalization, one-hot encoding, SMOTE class balancing, mutual-information feature selection, causal temporal sequence construction (T=10), and stratified partitioning. LSTM (96.1%), Random Forest (93.5%), SVM (91.2%) and KNN (89.7%); in NSL-KDD, it reaches 98.2% precision versus 96.4% (LSTM), 95.2% (CNN), 92.7% (Random Forest) and 90.8% (SVM), with margins of 2-9 percentage points in all measures. An ablation analysis identified SMOTE balancing as the most influential design choice (-3.7~pp F1 without it). The model achieves a real-time inference throughput of 27,800 flows/s on GPU and 0.082 ms/sample CPU latency in FP32,, with INT8 quantization providing an additional 3.1 x speedup at 0.3% accuracy loss, confirming deployment feasibility on resource-constrained IEDs with <128MB memory and establishing a deployable deep-learning framework for securing next-generation renewable energy smart grid infrastructure.

Authors (2)

Summary

  • The paper introduces a hybrid CNN-LSTM framework that effectively detects multi-stage cyber-attacks in smart renewable energy grids through real-time passive monitoring.
  • It employs a seven-stage preprocessing pipeline, including SMOTE and mutual information feature selection, to address severe class imbalance and enhance detection accuracy.
  • Experiments on CICIDS2017 and NSL-KDD datasets demonstrate up to 98.7% accuracy and fast convergence, ensuring robust performance with low computational cost.

Hybrid CNN-LSTM Intrusion Detection for Smart Renewable Energy Grids

Introduction

The proliferation of IoT, AMI, and SCADA systems within renewable energy smart grids intensifies the cyber-attack surface, subjecting grid infrastructure to attack modalities such as False Data Injection (FDI) and Denial-of-Service (DoS/DDoS) that are capable of subverting state estimation and crippling grid communication networks. Existing IDS approaches often neglect the temporal evolution of complex multi-stage attacks, poorly handle severe class imbalance endemic to operational network traffic, and fail to generalize across heterogeneous deployment environments. This work addresses these limitations by introducing a Hybrid CNN-LSTM IDS framework, architected for passive deployment at the boundary of smart grid metering and communication domains, explicitly targeting the unique threat surface of modern smart grids. Figure 1

Figure 1: Five-layer smart renewable energy grid architecture with the proposed CNN-LSTM IDS deployed as a passive tap at the Layer 3/4 boundary.

Methodology

System Design and Threat Model

The system leverages a five-layer grid architecture. The IDS is passively situated at the metering/communication boundary, acquiring unfiltered measurement traffic in real time without introducing operational latency. The threat model incorporates advanced persistent threat adversaries capable of FDI, DoS/DDoS, brute force, botnet/malware, and electricity theft, with particular emphasis on attacks that evade traditional residual-based bad data detection by exploiting topology knowledge. Thus, the IDS must detect both burst-like volumetric attacks and temporally-coordinated, stealthy campaigns.

Data Pipeline and Preprocessing

A seven-stage preprocessing pipeline is used: missing value imputation, min-max normalization, one-hot encoding for categorical features, aggressive class rebalancing using SMOTE, mutual information-based feature selection (top 40 dimensions), causal temporal window construction (T=10T=10), and stratified partitioning. This pipeline is designed to neutralize data imbalance (especially for rare attack types) and align feature spaces across datasets, mitigating the risk of overfitting and information leakage.

Hybrid CNN-LSTM Architecture

The architecture combines dual-layer 1D CNNs for spatial anomaly extraction with a 128-unit LSTM for sequence modeling, culminating in a multi-class softmax output. The CNN compresses local feature co-occurrence into embeddings, which are sequenced (T=10T=10) for LSTM processing, enabling the model to exploit both local feature patterns and long-range temporal dependencies. Figure 2

Figure 2: Proposed Hybrid CNN-LSTM IDS architecture with dual 1D-convolutional blocks, a reshape bridge, a 128-unit LSTM, and a multi-class softmax output.

This configuration, consisting of approximately 166K parameters, is optimized with Adam and integrated regularization that precludes overfitting despite aggressive balancing. The architecture is evaluated on CICIDS2017 and NSL-KDD datasets, ensuring validation across both contemporary and legacy traffic profiles.

Full Pipeline and Training

The operational workflow encompasses raw traffic ingestion, data cleaning, SMOTE-based oversampling, temporal embedding, CNN-LSTM training, and INT8 quantized deployment for both x86 and ARM-based targets. Figure 3

Figure 3: End-to-end workflow of the proposed CNN-LSTM IDS framework from data collection through preprocessing, training, and edge deployment.

Experimental Evaluation

Temporal Sequence Justification

Empirical analysis of flow-rate time series reveals that certain attacks (e.g., DoS Hulk) manifest as local, abrupt spikes efficiently detected by CNN spatial filters, whereas low-and-slow attacks (e.g., DoS Slowloris) necessitate sequential modeling unavailable to pure CNN models. This supports a hybrid approach. Figure 4

Figure 4: Temporal flow-rate patterns for benign, DoS Hulk, and DoS Slowloris traffic, illustrating the need for both CNN spatial and LSTM temporal modeling.

Convergence and Training Stability

Training and validation curves demonstrate monotonic convergence, achieving 98.4% validation accuracy with negligible overfitting due to the interplay of batch normalization and dropout. Figure 5

Figure 5: Training and validation accuracy of the proposed CNN-LSTM on CICIDS2017. The model converges to 98.4% validation accuracy at epoch 51 with a consistently narrow train–validation gap.

Figure 6

Figure 6: Training and validation loss of the proposed CNN-LSTM on CICIDS2017. Cross-entropy decreases from 1.45 to 0.06 over 51 epochs with a negligible train–validation gap, confirming no overfitting.

Hyperparameter Sensitivity

Ablation studies and sensitivity analysis identify the optimal learning rate and show SMOTE is the most critical factor (loss of –3.7pp F1 without it). Smaller performance drops are seen for architectural components, with the hybrid dual-CNN and LSTM contributing additive robustness. Figure 7

Figure 7: Learning rate sensitivity analysis over η∈{10−4,…,10−2}\eta \in \{10^{-4},\ldots,10^{-2}\}. Accuracy and F1-score both peak at η=0.001\eta=0.001, confirming it as the optimal operating point.

Figure 8

Figure 8: Ablation study showing the F1-score drop from removing each component. SMOTE balancing causes the largest single drop (−3.7pp), confirming preprocessing quality as the dominant performance factor.

Comparative Results

The hybrid CNN-LSTM surpasses all evaluated baselines (SVM, Random Forest, KNN, standalone CNN, standalone LSTM) by 2.6–9.0 percentage points across accuracy, F1, and AUC, attaining 98.7% accuracy, 98.0% F1, and 0.995 AUC-ROC on CICIDS2017. Figure 9

Figure 9: ROC curves for all six models on CICIDS2017. The CNN-LSTM achieves AUC = 0.995, with the largest margin over baselines in the low-FPR region critical for operational deployment.

The performance advantage persists across class imbalances: minority classes (e.g., infiltration, web attack, botnet) achieve strong F1 except in instances of statistical insignificance due to ground truth scarcity. False negative rates remain higher than false positive rates, indicating that the model’s primary error mode is missed attacks rather than alert overload. Figure 10

Figure 10: Multi-class Precision-Recall curves (one-vs-rest) on CICIDS2017. The CNN-LSTM achieves AP = 0.983, maintaining higher precision than all baselines particularly above recall = 0.6.

Figure 11

Figure 11: Precision-recall-F1 tradeoff versus classification threshold for the proposed CNN-LSTM. The optimal threshold τ∗≈0.48\tau^*\approx0.48 maximizes F1-score and serves as the operational calibration reference.

Figure 12

Figure 12: False-positive and false-negative rates per attack class on CICIDS2017. False negative rates exceed false positive rates across all classes, indicating residual errors are missed attacks rather than false alarms.

Model Convergence and Generalization

Convergence is consistently faster and more stable for the hybrid model relative to solo architectures, reaching optimality ~5 epochs earlier. Cross-dataset evaluation confirms low generalization gap (0.5pp) between CICIDS2017 and NSL-KDD, with the CNN-LSTM architecture universally achieving top performance among all baselines. Figure 13

Figure 13: Validation accuracy convergence of CNN, LSTM, and the proposed CNN-LSTM over 51 epochs on CICIDS2017. The hybrid model leads from epoch 1 and converges ~5 epochs faster than standalone LSTM.

Edge Feasibility and Computational Implications

The architecture remains efficient, with inference throughput surpassing 27,800 flows/s on GPU and 0.082ms/sample CPU latency (batch=100). INT8 post-training quantization yields a further 3.1×3.1\times speed boost with only 0.3% accuracy loss, affirming the model’s suitability for deployment on constrained IED/RTU nodes (<<128MB memory). Notably, computational costs remain lower than transformer-based alternatives while offering state-of-the-art performance, and the model’s resource profile is practical for grid settings where real-time response is paramount.

Theoretical and Practical Implications

This study empirically substantiates the necessity of simultaneous spatial and temporal modeling for IDS in smart grid environments. The sharp performance gains from SMOTE and mutual information feature selection highlight the enduring necessity of careful preprocessing in realistic, highly-skewed network environments. By attaining both high average-case and minority-class detection metrics without alert storm vulnerabilities, the proposed framework meets stringent operational requirements for scalable smart grid deployment.

Practically, the demonstrated effectiveness and edge-deployability directly address critical requirements in modern grid monitoring. The deployment of such models at the measurement/communication boundary supports both centralized and distributed defense-in-depth strategies and signals feasibility for integration with federated, privacy-preserving grid analytics.

Future Directions

Key research vectors include:

  • Federated IDS frameworks for privacy-preserving, cross-domain model updates.
  • Transformer and attention mechanisms for extending temporal dependencies beyond LSTM horizons.
  • Graph neural networks for topologically aware intrusion detection and attack localization.
  • Adversarial robustness assessment and defense integration.
  • Explainability (e.g., XAI, SHAP, LIME) for operator trust and regulatory compliance.
  • Online/Continual Learning for adaptation to novel threats without full retraining.
  • Integration of real-world, SCADA-native datasets to bridge the gap from synthetic/benchmark data to production environments.

Limitations

Limitations arise primarily from the reliance on benchmark datasets whose class characteristics and operational variety may not fully represent real smart grid deployments. SMOTE, while addressing imbalance, may create synthetic distributions not fully capturing novel adversary strategies. Static offline training prohibits adaptive learning under rapidly evolving adversarial conditions. Finally, implementation costs, though reasonable, surpass those of classical techniques, potentially constraining application on extremely low-power nodes.

Conclusion

This work demonstrates that a hybrid CNN-LSTM architecture, underpinned by a rigorous preprocessing and balancing pipeline, delivers state-of-the-art accuracy, robustness to severe class imbalance, and real-time inference performance for IDS in smart renewable energy grid domains. The framework provides an immediately deployable template for next-generation grid cybersecurity, with the flexibility to integrate emerging advances in deep sequence modeling, federated analytics, and automated interpretability. Future research must address federated adaptation, explainability, and adversarial robustness to further harmonize IDS capabilities with evolving operational and threat landscapes.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 5 likes about this paper.