Neural Solvers for Power Flow
- Neural solvers are ML models that map system controls to state variables using physics-based residual minimization to ensure AC feasibility.
- They integrate techniques such as test-time adaptation and physics-informed loss functions to achieve significant speedups and error reductions.
- These models enable flexible, modular integration in optimization workflows like Predict-then-Optimise for practical power system applications.
Neural solvers are machine learning-based surrogates, proxies, or approximations designed for solving or approximating solutions to power flow (PF) and optimal power flow (OPF) problems—tasks that underlie operational and planning analyses in electrical power systems. These neural surrogates, leveraging deep neural architectures such as graph neural networks (GNNs) or feedforward NNs, offer significant improvements in computational speed over classical iterative solvers. Recent developments focus on ensuring not only speed but also physical consistency, flexibility across tasks, and AC-feasibility by formulating and minimizing physics-based residuals rooted in power system laws (Stiasny et al., 14 Jan 2026, Dogoulis et al., 27 Nov 2025, Za'ter et al., 17 Oct 2025).
1. Mathematical Foundations of Neural Solvers
Neural solvers fundamentally address the mapping from system controls (e.g., power injections, setpoints) to state variables (e.g., bus voltages, angles) in a manner consistent with the underlying nonlinear and nonconvex AC power flow (PF) equations. The nodal power-balance equations for an -bus network with admittance matrix are
where are the voltage magnitudes and angles (Dogoulis et al., 27 Nov 2025, Stiasny et al., 14 Jan 2026).
The Residual Power Flow (RPF) formulation systematically encodes these constraints via residual vectors built from both Kirchhoff's Current Law (KCL) and Kirchhoff's Voltage Law (KVL). For state vector and controls , the KCL and KVL residuals are assembled as real vectors whose stacked form determines feasibility, i.e., a state-control pair is AC-feasible if and only if (Stiasny et al., 14 Jan 2026).
2. Residual-Based Learning and Surrogate Training
In neural solvers, the central training target is often a residual norm, typically
where stacks all active/reactive power as well as KVL residuals, and minimization of under varying yields the "best effort" or slack-adjusted AC-feasible solution (Stiasny et al., 14 Jan 2026). This residual-centric perspective enables continuous feasibility quantification, bypassing discrete bus-type distinctions and supporting a more flexible, reusable modeling interface.
Feature maps : These may be linear or implemented as small feedforward NNs, and the main prediction then takes the form , with parameter vector . Training minimizes mean squared error (MSE) between predicted and reference voltage (or state) vectors, and AD-based frameworks allow for seamless outer-loop differentiation required for embedded optimization (Stiasny et al., 14 Jan 2026).
Generation of training data involves sampling feasible (and, for robustness, infeasible) operating conditions on benchmark networks (e.g., IEEE 9-bus), and ground-truth labels are obtained by solving the inner RPF minimization potentially with slack variable adjustment (Stiasny et al., 14 Jan 2026).
3. Physics-Informed Losses and Test-Time Adaptation
Modern neural solvers incorporate explicit physics-based loss terms to enforce AC power flow consistency and operational constraints (e.g., voltage and line-flow limits) during both training and inference. In the Test-Time Training (PI-TTT) framework (Dogoulis et al., 27 Nov 2025), a surrogate model predicts bus voltages and angles, from which residuals and are computed against specified injections: At inference, PI-TTT refines parameters via a small number of gradient-based updates (5–20 steps; learning rate –) to minimize a loss that aggregates residuals and operational constraint violations.
The total test-time loss is
with ReLU-based penalties for voltage and flow limit violations. Empirically, PI-TTT achieves 1–2 orders of magnitude reductions in power flow residuals and violations at modest computational overhead (e.g., IEEE 14-bus: forward pass 3 ms, PI-TTT refinement +14 ms vs. Newton–Raphson 19 ms) (Dogoulis et al., 27 Nov 2025).
4. Predict-then-Optimise (PO) and Modular Integration
A central architectural advance is the integration of neural solvers within Predict-then-Optimise (PO) workflows (Stiasny et al., 14 Jan 2026). Here, fast surrogates provide voltage/state predictions for candidate controls , which are then refined or further optimized within outer-loop problems such as AC-OPF: where splits decision and non-decision controls. Gradients with respect to are obtained through AD, enabling fast outer-loop convergence.
Key advantages are:
- Removal of rigid bus-type asymmetries.
- Flexibility to switch between tasks (e.g., AC-OPF, state estimation, quasi-steady-state PF) without retraining the inner neural block.
- Quantification and minimization of infeasibility via residual norms.
5. Residual Correction and Hybrid Architectures
Residual learning strategies extend neural solvers to hybrid settings, particularly the correction of linear baseline solutions (such as DC-OPF) to full nonlinear AC-feasible ones (Za'ter et al., 17 Oct 2025). The Residual Correction Model defines the AC-OPF solution as
where is constructed from the DC-OPF output, and is a topology-aware GNN mapping DC variables to necessary nonlinear corrections.
The training loss comprises supervised fit, physics-informed residual penalties (enforcing AC power-flow feasibility and operational limits), cost-optimality objectives, and regularization of residual heads. Experiments with IEEE 57-, 118-, and 2000-bus systems show MSE reductions (up to 40%), 3× lower feasibility errors, and up to 13× speedup (e.g., 2000-bus: neural inference 7 s, AC-IPOPT 93 s). Robustness to N−1 contingencies is maintained, with small error increases in zero-shot tests and full accuracy upon fine-tuning (Za'ter et al., 17 Oct 2025).
6. Quantitative Evaluation and Performance Insights
Evaluation across a range of tasks and test systems demonstrates the strengths and boundaries of current neural solvers:
| Benchmark | Approach | MSE (V+P), pu | Feasibility Error | Speedup vs. AC-IPOPT |
|---|---|---|---|---|
| IEEE 14-bus | PowerFlowNet | 0.924 | — | — |
| IEEE 14-bus | PowerFlowNet+PI-TTT | 0.047 | — | ∼1.3× |
| IEEE 57-bus | Residual-Corr. GNN | ↓40% | 3× lower | — |
| IEEE 118-bus | Residual-Corr. GNN | 3.1e-4 | 3× lower | 11× |
| PEGASE 1354 | PowerFlowNet+PI-TTT | RMSE_P:0.72 | max abs. viol. ↓ | — |
| 2000-bus | Residual-Corr. GNN | ½ best base | — | 13× |
Physical residuals and operational constraint violations are consistently reduced by up to two orders of magnitude when employing residual-based test-time refinement or correction architectures (Dogoulis et al., 27 Nov 2025, Stiasny et al., 14 Jan 2026, Za'ter et al., 17 Oct 2025).
7. Scalability, Flexibility, and Limitations
Recent neural solver frameworks scale efficiently to networks of thousands of buses due to GNN architectures and message passing based on local topology (Za'ter et al., 17 Oct 2025). Training is achieved with moderate data requirements (e.g., 2,000 points for a 9-bus case) and standard optimizers (L-BFGS, Adam). Inference runtimes are in the 30 μs–14 ms range, offering near real-time feasibility checks and optimization.
However, on large or highly nonlinear networks, a fixed number of gradient refinement steps may be insufficient to drive residuals to zero; richer update parameterizations or more iterations may be required. Success also presupposes a high-quality initial surrogate; poor initialization can slow or impede convergence (Dogoulis et al., 27 Nov 2025). The reusability and generalization features of RPF and related formulations mitigate task inflexibility but optimality gaps may persist in domains with dynamics or topology far outside the training regimen (Stiasny et al., 14 Jan 2026).
Neural solvers represent a foundational data-driven paradigm for power-system analysis, integrating the speed of learned surrogates with physical reliability and reusability across operational and optimization tasks. The interplay of residual formulation, test-time adaptation, and modular optimization enables versatility beyond classical ML-based proxies (Stiasny et al., 14 Jan 2026, Dogoulis et al., 27 Nov 2025, Za'ter et al., 17 Oct 2025).