Physics-Informed LSTM Models

Updated 2 January 2026

Physics-informed LSTM models are recurrent neural networks that integrate governing equations and physical constraints into their architecture.
They use composite loss functions to balance empirical data errors with penalties for deviating from physical laws.
This approach is applied in diverse domains like seismic response, fluid thermodynamics, and chaotic system forecasting, proving enhanced robustness and efficiency.

Physics-Informed Long Short-Term Memory (LSTM) Model

Physics-informed Long Short-Term Memory (LSTM) models are a class of recurrent neural network (RNN) surrogates that integrate domain knowledge, including governing equations and physical constraints, directly into their architecture and/or training objective. Unlike purely data-driven LSTM models, physics-informed variants enforce conformity to physical laws—often in the form of partial differential equations (PDEs), ordinary differential equations (ODEs), conservation constraints, or system-specific constitutive relationships—by augmenting loss functions or injecting physics features. This fusion yields improved accuracy, robustness, generalizability, and interpretability in time-series prediction for complex dynamical systems across scientific and engineering domains.

1. Core Principles and Motivations

Physics-informed LSTM models exploit the strengths of the LSTM cell—gated memory mechanisms that preserve gradients and encode long-term temporal dependencies—while explicitly embedding knowledge of underlying physical processes. This approach addresses key limitations observed in classical deep learning surrogates for dynamical systems:

Generalization across domains and regimes: Purely data-driven models often fail to extrapolate outside training distributions or under limited data, whereas embedding governing equations regularizes learning.
Physical consistency and interpretability: Enforcing residuals of physics (e.g., mass conservation, energy balance, or specific ODE/PDE forms) ensures predictions respect domain constraints.
Data efficiency: Physics-informed LSTM models require less labeled data; physical loss terms can be enforced at additional collocation points even in unobserved regimes (Zhang et al., 2020).
Numerical stability and convergence: Incorporating physics can mitigate overfitting, oscillatory/chaotic behavior, and divergence in challenging regimes (Tao et al., 25 Dec 2025).

2. Architectural Variants and Mathematical Formulation

Multiple architectures instantiate physics-informed LSTM frameworks, often tailored to the domain or type of spatiotemporal data:

(a) Sequential LSTM Architectures with Physics Loss

Standard LSTM update equations are preserved, typically with $m$ stacked LSTM layers receiving time series (e.g., state, control, or feature sequences), producing outputs $z^p(t)$ (e.g., displacement, velocity, temperature) (Lahariya et al., 2022, Biswas et al., 26 Nov 2025):

$f_t = \sigma(W_{xf}x_t + W_{hf}h_{t-1} + b_f)$

$i_t = \sigma(W_{xi}x_t + W_{hi}h_{t-1} + b_i)$

$\tilde c_t = \tanh(W_{xc}x_t + W_{hc}h_{t-1} + b_c)$

$c_t = f_t \odot c_{t-1} + i_t \odot \tilde c_t$

$h_t = o_t \odot \tanh(c_t)$

Predicted outputs are constrained by specialized physics-informed loss terms enforcing ODE/PDE residuals or discretized equations (see Section 3).

(b) Hybrid Architectures: U-Net/LSTM, ConvLSTM, Graph-LSTM

Encoder-Decoder plus LSTM: Features are first extracted from input sequences (e.g., ground acceleration, image, or spatial snapshots) using 1D/2D CNN architectures such as causal U-Net or CAE, then propagated temporally by stacked LSTM layers (Biswas et al., 26 Nov 2025, Menicali et al., 16 May 2025).
Graph SAGE-LSTM/GCN-LSTM: Node embeddings are computed via Graph Neural Networks (GNNs), encoding spatial topology or mesh, then evolved in time-sequence using LSTM gating (elementwise or graph-convolutions replace standard affine transforms) (Liu et al., 2024, Razavi et al., 18 Sep 2025).
Multi-branch LSTM: Separate LSTMs handle state evolution, restoring forces, or hysteretic states; outputs and their time-differentiated forms feed into a composite loss (Zhang et al., 2020).

3. Loss Formulation and Physics Constraints

A defining feature is the composite loss function:

$J(\theta) = w_1 J_D(\theta) + w_2 J_P(\theta)$

where $J_D$ is the empirical/data-driven error (e.g., MSE to observations), and $J_P$ penalizes violation of discretized or continuous governing equations (PDE/ODE residuals):

State evolution constraint: Enforce $\dot{x}(t) = f_{\rm phys}(x(t), u(t), ...)$ (ODE form) or corresponding finite-difference residuals at collocation points (Halder et al., 2023, Biswas et al., 26 Nov 2025).
Physics-informed regularizer: Penalize time/space derivatives of the LSTM outputs that deviate from physical models, e.g., energy conservation, constitutive or kinematic relations (Özalp et al., 2023, Lahariya et al., 2022, Zhang et al., 2020).
Physical feature injection: Concatenate physical features, such as environmental covariates from weather or climate models, as node features in graph-based LSTMs (Liu et al., 2024).
Boundary and symmetry constraints: Impose exact or regularized adherence to boundary conditions (e.g., no-slip, mass conservation) (Tao et al., 25 Dec 2025).

Loss weights are tuned to achieve a trade-off: pure data loss minimizes empirical risk, but excessive disregard for $J_P$ degrades physical fidelity.

4. Domain-Specific Applications and Quantitative Results

Physics-informed LSTM surrogates have been deployed in diverse domains:

Domain (Task)	Model Variant	Key Physics Constraints	Reported Metrics / Results	Reference
Seismic response prediction of structures	U-Net-LSTM, finite-diff enforcement	ODE motion equations	Corr. up to 0.998; 2-3 orders speedup	(Biswas et al., 26 Nov 2025)
Nonlinear structural metamodeling	Multi-LSTM (PhyLSTM²/³)	EOM, kinematics, hysteresis law	$\gamma>0.9$ in >80% runs	(Zhang et al., 2020)
Fluid thermodynamics (RBC)	ConvLSTM + CAE	PDE residuals (Navier–Stokes, etc.)	MSE $_{\rm RBC}$ = 0.02; 140× speedup	(Menicali et al., 16 May 2025)
Chaotic system forecasting	PI-LSTM	ODEs (Lorenz-96, etc.)	RMSE $<$ 0.3 for hidden states	(Özalp et al., 2023)
Energy systems / cooling towers	PhyLSTM	Conservation ODE	$\leq$ 2% RMSE, fast convergence	(Lahariya et al., 2022)
Electrohydrodynamics	LSTM-PINN	Steady PDEs with BCs	Stable up to 7e-3 LR, low final loss	(Tao et al., 25 Dec 2025)
Polar ice/cryosphere (graph)	GraphSAGE-LSTM with MAR features	Physics via node feature injection	10% lower RMSE vs. non-physics GNN	(Liu et al., 2024)

This table highlights the diversity of architecture and constraint choices, with consistent improvements over both baseline LSTM and traditional physics-agnostic ML or numerical baselines.

5. Training Paradigms and Implementation

Data regimes and hardware-bounded constraints motivate varied training procedures:

Data splits: Cases with limited (10 records) versus abundant (50+) training examples; validation on held-out or cross-regime samples (Biswas et al., 26 Nov 2025).
Optimization: Standard choice is Adam optimizer, with learning rates $10^{-4} - 10^{-3}$ , often followed by L-BFGS refinement in reduced configurations (Lahariya et al., 2022, Zhang et al., 2020).
Minibatching/sequencing: Sequence length ( $T=60\sim1000$ ) is a key consideration; for spatial surrogates, graph- or spatial-dimension batching needed (Razavi et al., 18 Sep 2025, Liu et al., 2024).
Automatic differentiation or explicit finite difference: Time/space derivatives for physics loss are often implemented via explicit finite-difference convolution (e.g., as in (Biswas et al., 26 Nov 2025)) or using PyTorch autograd on LSTM outputs (Lahariya et al., 2022, Menicali et al., 16 May 2025).
Loss weighting and dynamic adjustment: Some models modulate data/physics loss influence using dynamic weight averaging or gradient norm rescaling for balance (e.g., (Menicali et al., 16 May 2025)).

6. Comparative Evaluation and Robustness

Quantitative studies consistently show:

Accuracy & correlation: Physics-informed LSTMs achieve superior time-history correlation (often $>0.9$ vs $0.7$ for physics-agnostic baselines) and lower relative or absolute prediction error, robustly across data regimes (Biswas et al., 26 Nov 2025, Özalp et al., 2023).
Generalization: LSTM architectures regularized with physics losses extrapolate reliably to unseen regimes, with little degradation under data scarcity, outperforming competing black-box models (Zhang et al., 2020, Biswas et al., 26 Nov 2025).
Stability: LSTM-PINN is observed to avoid divergence and boundary artifacts under aggressive learning rates and complex PDEs, in contrast to MLP-PINN approaches (Tao et al., 25 Dec 2025).
Physical interpretability: Injection of domain constraints allows recovery of latent physical states (e.g., restoring force, hysteresis, or unmeasured variables) without direct supervision (Özalp et al., 2023, Zhang et al., 2020).
Computational efficiency: Surrogate inference times are reduced by orders of magnitude compared to solver-based approaches (e.g., $>100\times$ for turbulent RBC (Menicali et al., 16 May 2025), $>1000\times$ for structural FEM (Biswas et al., 26 Nov 2025)).

7. Limitations, Extensions, and Outlook

Identified limitations and prospective extensions include:

Expressivity and depth: Single-layer designs may be insufficient for capturing multiscale processes; deeper or hybrid LSTM-Transformer architectures, higher-resolution spatial encoders (e.g., vision transformers, U-Nets) are active areas of development (Menicali et al., 16 May 2025, Biswas et al., 26 Nov 2025).
Physics loss formulation: Efficacy depends on precise formulation; choice between continuous (AD-based) vs. discretized (finite-diff or black-box) derivatives impacts training stability and computational burden (Halder et al., 2023).
Constraint enforcement: Some architectures enforce physics only via auxiliary inputs, not direct loss regularization—a distinction critical in evaluating physical consistency (Liu et al., 2024).
Scalability and complexity: Large graph sizes or high-resolution spatio-temporal domains present memory and training challenges; batching heuristics and 3D convs are practical remedies (Razavi et al., 18 Sep 2025, Menicali et al., 16 May 2025).
Uncertainty quantification: Incorporation of conformal prediction or Bayesian layers is essential where deterministic long-horizon forecasts are ill-posed (Menicali et al., 16 May 2025).
Application domains: Physics-informed LSTM surrogates continue to expand to new areas, e.g., hybrid flow/transport, geoscientific forecasting, thermal/electrohydrodynamic flows, and chaotic state reconstruction (Tao et al., 25 Dec 2025, Özalp et al., 2023).