Physics-Guided Machine Learning
- Physics-guided machine learning is a framework that integrates known physical principles, such as conservation laws and PDEs, into ML model design to ensure reliable and physically consistent predictions.
- It employs strategies like physics-based loss functions, architectural embedding, and operator injection to reduce data requirements and enhance model interpretability.
- PGML methods achieve improved generalization, efficiency, and uncertainty quantification, making them ideal for tackling inverse problems and complex scientific applications.
Physics-guided machine learning (PGML) refers to a family of frameworks and algorithms that structurally incorporate prior physical knowledge—such as governing equations, conservation laws, symmetry principles, and domain constraints—into machine learning model design, training, and inference. This paradigm is motivated by the need for data-efficient, robust, and physically consistent predictive models in scientific and engineering applications, particularly those where data are scarce, high-fidelity simulations are expensive, and reliability and interpretability are paramount.
1. Fundamental Principles of Physics-Guided Machine Learning
The central tenet of PGML approaches is the systematic fusion of physics-based models or constraints with statistical learning architectures. The ways in which physical information is integrated include:
- Physics-based loss and regularization: Penalizing violation of known equations or constraints during training through dedicated loss terms.
- Architectural embedding: Designing neural architectures or signal pathways such that physical rules (e.g., conservation, objectivity, boundary conditions) are exactly enforced by construction.
- Operator injection: Incorporating outputs or intermediate features derived from analytical or numerical physics models (e.g., Galerkin projection, panel methods, PDE solvers) into machine learning pipelines, often at intermediate network layers rather than as simple input augmentations.
- Adjoint and hybrid workflows: Coupling automatic differentiation across learned networks and existing physics engines via adjoint equations to enable efficient end-to-end parameter identification.
- Physics-informed features or kernels: Using function libraries, implicit models, or kernel spaces derived from physical considerations to guide hypothesis space selection.
This paradigm distinguishes itself from purely data-driven ("black box") approaches by the explicit requirement that any learned mapping or latent representation remains compatible with fundamental physical knowledge (Pawar et al., 2021, Pawar et al., 2020, Park et al., 2024).
2. Representative Frameworks and Algorithms
PGML frameworks take diverse forms tailored to the problem class and the nature of available physics:
| Framework / Paper | Domain/Problem | Integration Strategy |
|---|---|---|
| rRIM (Park et al., 2024) | Optical inverse problems | Iterative, physics-encoded recurrent update with explicit forward/adjoint operators |
| PGML with simplified theories (Pawar et al., 2020) | Fluid mechanics, aerodynamics | Panel-method features injected into hidden layers of feedforward networks |
| AdjointNet (Karra et al., 2021) | PDE-based inversion | Embedding black-box physics solver with adjoint for differentiable training |
| Physics-guided hierarchical CNNs (Lynch et al., 24 Feb 2025) | Photonics, Maxwell solvers | Hierarchical U-Net with physical analytic layers and loss terms for Maxwell’s equations |
| Embedded-physics ML (Schöberl et al., 2020) | Molecular dynamics, coarse-graining | Objective maximizing reverse KL wrt physical force-fields, no big MD data |
| Variational Multiscale ROM PGML (Ahmed et al., 2022) | Model reduction for PDEs | LSTM closures with Galerkin operators as injected features; multiscale VMS closure modeled by ML |
| Physics-constrained Extreme Learning (Zhuang et al., 24 Oct 2025) | Inverse Stefan (phase change) | Closed-form output weights satisfying all physics constraints via Moore-Penrose pseudo-inverse |
| Probabilistic PGML surrogates (Deo et al., 30 Sep 2025) | Underwater acoustics | Physics-informed mean (analytic) + neural encoder + SVGP residual, with calibrated uncertainty |
| Kernel-based PGML (Doumèche, 11 Jul 2025) | Time series, general regression | PINN and kernel ridge regression under linear/operator PDE priors, with quantifiable rates |
The architectural and optimization strategies in these frameworks aim to hardwire physical laws (e.g., conservation, symmetries, admissible state-spaces) into every step of model computation, thus reducing sample complexity, improving out-of-distribution generalization, and ensuring physically plausible extrapolative behavior (Doumèche, 11 Jul 2025, Lynch et al., 24 Feb 2025, Park et al., 2024).
3. Mathematical Foundation and Loss Formulations
Physics-guided learning is typically posed as a composite optimization problem. For supervised regression with physical priors (e.g., PDE constraints), the canonical loss is:
- Data-fidelity term: Measures mismatch between model predictions and observed data, e.g., mean-squared error.
- Physics regularizer: Penalizes violation of governing equations, operator residuals, or violation of constraint sets at collocation points or globally (e.g., , where is a PDE operator).
- Architectural constraints: In some frameworks, the architecture itself enforces the physics, thus the regularizer is intrinsic or even absent (e.g., phase-field fracture, symplectic Hamiltonian networks, monotonicity-constrained surrogates) (Aldakheel et al., 13 Feb 2025, Tong, 2024, Jadhav et al., 5 Jan 2026).
For ill-posed inverse problems, physics embedding is manifest in the recurrent update schemes (e.g., rRIM):
with loss: where enforces smoothness, positivity, or moment constraints (Park et al., 2024).
Hybrid frameworks with adjoint solvers (e.g., AdjointNet, PyTorch–Firedrake) propagate gradients through the full forward and backward physics chain, ensuring consistency with numerical simulation at each update (Bouziani et al., 2023, Karra et al., 2021).
PGML for UQ retains hard constraints by only allowing learned (e.g., CNN-driven) modulation of perturbation magnitudes, not mechanisms, ensuring physical realizability (e.g., clamping in EPM for turbulence models) (Chu et al., 7 Nov 2025).
4. Applications and Quantitative Performance
Physics-guided machine learning has demonstrated substantial advances across a range of tasks:
- Inverse problems: Achieves robust recovery of underlying physical quantities (e.g., pairing glue in optics, phase-field in fracture) with far fewer samples and better out-of-distribution performance than black-box networks (Park et al., 2024, Aldakheel et al., 13 Feb 2025).
- Reduced order modeling: Enables data-efficient closure of ROMs via multiscale separation, LSTM closures, physics-injected latent representations, and achieves up to two orders of magnitude reduction in error vs. standard Galerkin ROMs (Ahmed et al., 2022, Pawar et al., 2021).
- Surrogate modeling and digital twins: Combines fast analytic means (e.g., geometric spreading in acoustics) with neural encoders and Gaussian processes for calibration and UQ, enabling inference-time acceleration up to over full physics solvers (Deo et al., 30 Sep 2025).
- Constraint enforcement: Frameworks such as PIELM outperform PINNs by orders of magnitude in both accuracy and training time for moving-boundary problems, via global linear solves instead of iterative gradient descent (Zhuang et al., 24 Oct 2025).
- Time series and forecasting: Kernel-based PGML and hybrid PINN approaches yield state-of-the-art prediction for electricity demand, mobility-aware energy systems, and hierarchical time series under domain-encoded weak constraints, with well-understood generalization error and statistical guarantees (Doumèche, 11 Jul 2025).
Results in these studies consistently show not only enhanced test accuracy/RMSE but also improved adherence to physical ground-truth (e.g., conservation, correct singular/critical behavior, interpretability). Unphysical predictions, such as mass leakage, negative dissipation, or boundary violation, are mitigated or eliminated.
5. Robustness, Generalization, and Efficiency
Several features of PGML frameworks are repeatedly highlighted:
- Sample efficiency: Embedding physical operators and constraints drastically reduces the required training set size (e.g., thousands vs. millions of samples), as demonstrated for optical spectrum inversion, MD coarse graining, and hierarchical photonics networks (Park et al., 2024, Schöberl et al., 2020, Lynch et al., 24 Feb 2025).
- Out-of-distribution generalization: Iterative update schemes, hierarchical/transfer-learned architectures, and hybrid means–residual decompositions retain predictive skill on test scenarios with underlying physical regimes or feature combinations not encountered during training (Park et al., 2024, Lynch et al., 24 Feb 2025, Doumèche, 11 Jul 2025).
- Noise robustness: Physics-imposed structure at every training and inference step limits overfitting to spurious noise, as rigorously confirmed by error metrics vs. noise level (Park et al., 2024).
- Optimization: Embedding physics directly in loss or architecture (e.g., via adjoints, monotonicity, symplectic integration, least-squares solvers) often reduces the complexity of hyperparameter tuning and the sensitivity to initialization (Zhuang et al., 24 Oct 2025, Tong, 2024, Jadhav et al., 5 Jan 2026).
- Uncertainty quantification: Modern frameworks incorporate probabilistic heads (e.g., SVGP, Bayesian ensembles) in conjunction with physics guidance, providing calibrated prediction intervals and facilitating decision-making under uncertainty (Deo et al., 30 Sep 2025, Chu et al., 7 Nov 2025, Pawar et al., 2021).
6. Limitations and Open Directions
Known limitations and open challenges include:
- Complexity of physical prior: For very high-dimensional or highly nonlinear systems (e.g., turbulent flows with chaotic transitions), physical knowledge may not fully specify the hypothesis space; residual data-driven flexibility remains essential (Chu et al., 7 Nov 2025, Ahmed et al., 2022).
- Scalability: For linear-operator approaches (e.g., PIELM, kernel-based PGML), computational or memory bottlenecks arise for large or , necessitating dimensionality reduction, efficient solvers, or sparsification (Zhuang et al., 24 Oct 2025, Doumèche, 11 Jul 2025).
- Differentiable programming interfaces: Although libraries coupling machine learning frameworks and established physics solvers exist (e.g., PyTorch–Firedrake, AdjointNet), not all codes provide adjoint capabilities, and adapting legacy software may require substantial engineering (Bouziani et al., 2023, Karra et al., 2021).
- Uncertainty Quantification: Bayesian/posterior frameworks are less mature for architecture-embedded approaches and require further development for rigorous confidence bounds (Park et al., 2024, Deo et al., 30 Sep 2025).
- Model misspecification: Quality and bottlenecking of the “cheap” physics model (e.g., panel methods, reduced-order surrogates) can limit the performance and generalizability of the hybrid model (Pawar et al., 2020). The extent to which physics-injecting internal layers vs. input layers improves out-of-distribution accuracy remains problem-dependent.
Future directions include scalable implicit/online variants, meta-learning for automatic prior integration, adaptive selection of physical constraints, and broader applications to multiphysics, multiscale, or networked systems with sparse and irregular data (Tong, 2024, Lynch et al., 24 Feb 2025, Deo et al., 30 Sep 2025).
7. Impact and Outlook
Physics-guided machine learning provides a mathematically grounded and practically validated methodology for accelerating and improving modeling, inference, and control in systems governed by physical laws. Its success arises from the effective synergy between domain knowledge and modern machine/statistical learning. This suggests that as scientific workflows incorporate progressively richer sources of prior knowledge—ranging from explicit analytic formulae to empirical laws and output of sophisticated simulators—the PGML paradigm will become increasingly central across a broad spectrum of data-driven science and engineering (Park et al., 2024, Lynch et al., 24 Feb 2025, Doumèche, 11 Jul 2025, Ahmed et al., 2022).