Real-Time Surrogate Modeling for Fast Transient Prediction in Inverter-Based Microgrids Using CNN and LightGBM

Published 31 Mar 2026 in eess.SY and cs.LG | (2603.29255v1)

Abstract: Real-time monitoring of inverter-based microgrids is essential for stability, fault response, and operational decision-making. However, electromagnetic transient (EMT) simulations, required to capture fast inverter dynamics, are computationally intensive and unsuitable for real-time applications. This paper presents a data-driven surrogate modeling framework for fast prediction of microgrid behavior using convolutional neural networks (CNN) and Light Gradient Boosting Machine (LightGBM). The models are trained on a high-fidelity EMT digital twin dataset of a microgrid with ten distributed generators under eleven operating and disturbance scenarios, including faults, noise, and communication delays. A sliding-window method is applied to predict important system variables, including voltage magnitude, frequency, total active power, and voltage dip. The results show that model performance changes depending on the type of variable being predicted. The CNN demonstrates high accuracy for time-dependent signals such as voltage, with an $R^2$ value of 0.84, whereas LightGBM shows better performance for structured and disturbance-related variables, achieving an $R^2$ of 0.999 for frequency and 0.75 for voltage dip. A combined CNN+LightGBM model delivers stable performance across all variables. Beyond accuracy, the surrogate models also provide major improvements in computational efficiency. LightGBM achieves more than $1000\times$ speedup and runs faster than real time, while the hybrid model achieves over $500\times$ speedup with near real-time performance. These findings show that data-driven surrogate models can effectively represent microgrid dynamics. They also support real-time and faster-than-real-time predictions. As a result, they are well-suited for applications such as monitoring, fault analysis, and control in inverter-based power systems.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates that a hybrid model combining CNN and LightGBM can accurately predict transient states in inverter-based microgrids, achieving performance metrics like R² up to 0.999.
Methodology leverages a digital twin for dataset synthesis under multiple disturbance scenarios, optimizing model selection for different microgrid variables.
The surrogate approach delivers over 500× speedup compared to traditional EMT simulations while maintaining robust predictions even under out-of-distribution conditions.

Surrogate Modeling for Fast Microgrid Transient Prediction with CNN and LightGBM

Introduction

The transition toward power-electronics-dominated microgrids, driven by high penetration of distributed energy resources (DERs), imposes stringent requirements on accurate and fast dynamic simulation. Classical electromagnetic transient (EMT) simulation remains the gold standard for detailed analysis, especially given the responsiveness and coupling characteristics of inverter-based systems. However, EMT methods are computationally intractable for real-time applications, hindering their use in closed-loop online monitoring and control. The paper "Real-Time Surrogate Modeling for Fast Transient Prediction in Inverter-Based Microgrids Using CNN and LightGBM" (2603.29255) develops a data-driven surrogate modeling framework utilizing convolutional neural networks (CNNs) and Light Gradient Boosting Machine (LightGBM) for efficient, real-time transient prediction on high-fidelity microgrid simulations.

System Architecture and Dataset Generation

The study employs a ten-unit inverter-based microgrid model with detailed EMT-level representation (1 µs resolution), constructed in MATLAB/Simulink. Each distributed generator (DG) comprises a generic power source, DC-link, voltage-source inverter with PWM, LCL output filter, and coupling transformer, under the supervision of a local controller. This arrangement offers the requisite controllability and fidelity for capturing realistic inverter-dominated dynamics.

Figure 1: Schematic of a typical inverter-based DG, detailing energy source, DC-link, VSI, passive interface, and feedback-driven control.

Dataset synthesis leverages a digital twin of the microgrid to produce multi-channel measurements under eleven scenario-driven operating modes—ranging from faults and generator trips to noise injection and communication delays—thereby covering both physical and cyber-physical contingencies.

Figure 2: Workflow for digital-twin-based dataset creation, including disturbance scenario simulation and time-synchronized variable extraction.

Three-phase voltages, currents, power outputs, and system frequency are processed using a sliding-window extraction to construct rich temporal inputs. Variable derivation (e.g., voltage magnitude, aggregated power flows, voltage dips) augments base features, facilitating prediction of both direct and event-driven system states.

Surrogate Model Design

A structured pipeline preprocesses the time-series dataset for two parallel surrogate architectures. CNNs directly encode windowed waveform sequences, excelling at local temporal feature extraction, while LightGBM processes statistical summaries (mean, variance, extrema) of the same windows, optimizing for structured regression. Both approaches target multi-output prediction of key microgrid variables: voltage magnitude ( $V_{\mathrm{mag}}$ ), frequency ( $f_{\mathrm{DG1}}$ ), total active power ( $P_{\mathrm{total}}$ ), and voltage dip severity ( $V_{\mathrm{dip}}$ ).

Figure 3: Parallel surrogate model workflow for rapid prediction; CNN learns directly from raw sequence input, LightGBM exploits engineered features.

LightGBM operates as an ensemble of iterative regression trees trained on window-based descriptors. The CNN branches consist of stacked one-dimensional convolutional and pooling layers, followed by a dense output head. Both are trained for single-sample prediction using mean squared error loss, with model selection determined by validation loss under disturbance-rich conditions.

Model Training and Out-of-Distribution Generalization

CNN and LightGBM are independently trained on the digital twin dataset. The test regime incorporates out-of-distribution data, including additive noise and synthetic communication delays, to evaluate reliability under degraded measurement quality. Early stopping and hyperparameter optimization (see Table 1 in the paper) are used to avoid overfitting.

Figure 4: CNN and LightGBM convergence on multi-target regression, demonstrating stability under OOD scenarios including noise and delay.

Quantitative Results and Comparative Evaluation

Regression Performance

Extensive experimentation reveals variable-dependent optimal model selection:

Voltage Magnitude ( $V_{\mathrm{mag}}$ ): CNN achieves superior fit ( $R^2 = 0.837$ ), capturing nuanced time dependencies absentee in engineered statistics.
System Frequency ( $f_{\mathrm{DG1}}$ ): LightGBM achieves extremely high correspondence ( $R^2 = 0.999$ ), benefiting from signal smoothness and structural invariance.
Total Active Power ( $P_{\mathrm{total}}$ ): Both approaches yield high $R^2 > 0.96$ , with CNN slightly outperforming in tracking abrupt ramps.
Voltage Dip Severity ( $f_{\mathrm{DG1}}$ 0): LightGBM outperforms CNN (0.753 vs. 0.267 $f_{\mathrm{DG1}}$ 1), consistent with its strength on aggregated, event-derived features.

A hybrid strategy selecting the best model per variable maintains stable, high performance under all conditions.

Figure 5: Surrogate model predictions versus ground truth—hybrids align closely for smooth signals, but CNN and LightGBM diverge under volatile dip events.

Error Analysis

Residual distributions are highly concentrated around zero for all but the most dynamic, event-driven outputs. Both methods suppress systematic bias, though CNN predictions for derived, intermittent variables such as voltage dip suffer wider error margins.

Figure 6: Histograms and kernel density estimates of prediction residuals; $f_{\mathrm{DG1}}$ 2 shows heavier tails due to event localization.

Computational Efficiency

Timing analysis demonstrates that LightGBM delivers speedups of over $f_{\mathrm{DG1}}$ 3 compared to EMT simulation, running faster than real time and supporting deployment on resource-constrained hardware. The CNN model, while more accurate for time-localized variables, is more computationally demanding but still reduces runtime by over $f_{\mathrm{DG1}}$ 4. The hybrid framework balances competing demands, achieving an over $f_{\mathrm{DG1}}$ 5 speedup while maintaining near-optimal prediction accuracy on all targets.

Figure 7: Relative accuracy ( $f_{\mathrm{DG1}}$ 6) and wall-clock runtime (log-scaled), highlighting tradeoffs between CNN, LightGBM, and their hybridization.

Implications and Prospects

This research substantiates the practical feasibility of replacing EMT simulation with surrogate models for real-time transient state estimation in inverter-based microgrids. The differentiation in model suitability by output variable underscores the merit of combining temporal sequence learners (CNNs) with feature-optimized ensemble regressors (LightGBM). The inclusion of OOD regimes (noise, delays) and robust error analysis strengthens the argument for their operational resilience.

From a theoretical standpoint, these results reinforce the view that feature engineering, temporal encoding, and disturbance-driven evaluation are all necessary for high-fidelity surrogate modeling in modern power electronics-rich grids. The study directly informs the design of fast “digital twin” environments for closed-loop operation, event detection, and adaptive control schemes.

Future work should extend to generalized multi-system training, online incremental updating, and transformer-based architectures for long-range temporal dependencies. Closer coupling with physical/physics-informed models may enhance interpretability and stability in untested domains. Integration with edge computing resources will facilitate deployment in real-world substations or microgrid controllers.

Conclusion

"Real-Time Surrogate Modeling for Fast Transient Prediction in Inverter-Based Microgrids Using CNN and LightGBM" (2603.29255) comprehensively demonstrates that data-driven surrogate models can accurately and rapidly approximate EMT simulation, meeting the stringent requirements for real-time microgrid monitoring. The findings point to tailored model selection—CNN for temporally dense signals, LightGBM for feature-centric or event-driven outputs—as essential for operationalizing surrogate approaches in modern, inverter-rich microgrids. The hybridization of temporal and feature-based machine learning provides a robust, scalable path forward for AI-accelerated real-time power system analysis.

Markdown Report Issue