PI-AttNP: Physics-Informed Attentive Neural Process

Updated 22 September 2025

The paper introduces PI-AttNP, which hybridizes Attentive Neural Processes with physics-informed priors to boost estimation accuracy and formal uncertainty quantification.
It employs dual encoding paths—deterministic cross-attention and latent stochastic aggregation—to fuse mechanistic models with data-driven residual learning.
Split conformal prediction is integrated to provide reliable uncertainty bounds, demonstrated through superior performance in grey-box quadrotor state estimation.

Physics-Informed Attentive Neural Process (PI-AttNP) is a hybrid modeling framework that synthesizes the distributional learning and adaptive context selection capabilities of Attentive Neural Processes (AttNP) with explicit physical prior incorporation. PI-AttNP leverages attention-driven latent function representations and incorporates simplified (often incomplete) physics-based models, typically in grey-box fashion, to enhance predictive accuracy, enable robust state estimation, and facilitate formal uncertainty quantification. This architecture is particularly suited for real-time estimation and control tasks in nonlinear dynamical systems subject to uncertainty, where both data-driven learning and physical model knowledge are required (Hunter et al., 15 Sep 2025).

1. Hybrid Model Design and Physical Prior Integration

PI-AttNP augments the classic Attentive Neural Process architecture by incorporating a physics-informed prior, denoted as $g(\cdot)$ . This prior typically derives from a low-fidelity, mechanistic model (e.g., simplified kinematic or dynamical equations) and provides a baseline "apriori" prediction for system state transitions. The PI-AttNP explicitly models the predictive function as

$\hat{f}_\Gamma(x, u, \Delta t) = g(x, u, \Delta t) + \mathrm{NN}(x, u, \Delta t | \Gamma)$

where $g(\cdot)$ embeds physical constraints and $\mathrm{NN}(\cdot)$ is a neural network trained to capture residuals arising from sensor noise, external disturbances, and unmodeled nonlinearities.

The architecture employs a context-target formulation. Context sets $\phi_C = \{(x_c, y_c)\}$ are embedded via a multi-layer perceptron (MLP) into a global representation $R_C$ . Two parallel encoding paths are established:

Deterministic Path: Employs cross-attention (multi-head scaled dot-product) via encoder MLP $\Lambda(\cdot)$ , generating a context-aware attention matrix $R_\Lambda$ .
Latent Stochastic Path: Generates a permutation-invariant latent representation by aggregating MLP embeddings and encoding them with $\Phi(\cdot)$ , defining prior and posterior distributions $p(z|\phi_C), p(z|\phi_T)$ .

The decoder $\Theta(\cdot)$ combines the target input $x_T$ , latent sample $z$ , attention matrix $R_\Lambda$ , and the apriori state estimate from the physics model. This synergy enforces physical plausibility while enabling adaptive, data-driven corrections.

2. Training Objective and Algorithmic Structure

The learning objective of PI-AttNP is a variational evidence lower bound (ELBO) for the marginal likelihood of the target distribution, formalized as

$\text{ELBO} = \mathbb{E}_{z \sim q_\Phi(z|\phi_T)} \left[ \log p_\Theta(y_T | x_T, R_\Lambda, z, \hat{y}_T^-) \right] - D_{KL}\left( q_\Phi(z|\phi_T) \Vert q_\Phi(z|\phi_C) \right)$

where $p_\Theta$ parameterizes a (typically Gaussian) predictive distribution and the KL divergence regularizes latent space consistency between context and target-encoded distributions.

The training procedure, detailed in Algorithm 1 of (Hunter et al., 15 Sep 2025), involves:

Batch sampling of noisy context data due to sensor perturbations.
Calculation of apriori predictions via the physics model.
Forward propagation through deterministic and latent paths.
Probabilistic prediction via the decoder.
Backpropagation to update all parameters via Adam with prescribed learning rates and decay.

Explicit inclusion of the physics prior ensures the model's predictions remain physically plausible, even in low-data or high-noise regimes.

3. Uncertainty Quantification via Split Conformal Prediction

For safety-critical estimation, PI-AttNP augments its predictive distributions with a split conformal prediction (CP) framework. For each state variable $j$ ,

CP computes conformity scores over a calibration set:

$s_i^{(j)} = \frac{(y_i^{(j)} - \hat{y}_i^{(j)})^2}{\hat{\sigma}_i^{(j)}}$

The empirical $(1-\alpha)$ -quantile $q_\alpha^{(j)}$ is extracted such that coverage guarantees hold:

$C_\alpha^{(j)}(x_T) = [\hat{y}_T^{(j)} - q_\alpha^{(j)} \hat{\sigma}_T^{(j)}, \, \hat{y}_T^{(j)} + q_\alpha^{(j)} \hat{\sigma}_T^{(j)}]$

During inference, the predicted variance $\hat{\sigma}_T^2$ is scaled by $q_\alpha$ , and this interval provides formal coverage guarantees (with exact bounds depending on calibration set size, per theorems presented in the paper).

Furthermore, the scaled uncertainty is used to compute an adaptive fusion weight $\beta_k$ for recursive estimation, combining model prediction and new sensor observation in proportion to their respective uncertainties, as detailed in Algorithm 2 of (Hunter et al., 15 Sep 2025).

4. Application to Nonlinear Grey-Box Quadrotor State Estimation

The primary application in (Hunter et al., 15 Sep 2025) is to grey-box state estimation for a simulated six-DoF underactuated quadrotor subjected to multimodal Gaussian sensor noise, external forces (e.g., wind), and aggressive perturbations (e.g., rotor spiking). The state vector comprises translational/rotational velocities and accelerations, and sensor measurements are systematically corrupted.

The simplified physics model $g(\cdot)$ encodes standard mechanics (e.g., gravity, thrust, torque), but does not capture noise or unmodeled dynamics. PI-AttNP is trained to fit the residuals, effectively learning hard-to-model nonlinearities and disturbances.

Empirical results demonstrate:

Achieved RMSE for state estimation is $9.8 \pm 5.3$ , lower than alternatives such as AttNP and PINN-LSTM.
Negative log-likelihood is minimized, reflecting improved probabilistic calibration.
Trajectories tracked by PI-AttNP exhibit close adherence to ground truth, and the conformal intervals yield statistically calibrated coverage.

Model inference is lightweight (approx. 192K parameters) and suitable for real-time operation, contrasting with heavy baselines such as the Deep Kalman Filter (>3M parameters).

5. Comparative Analysis with State-of-the-Art Methods

PI-AttNP is benchmarked against several key baselines:

Vanilla AttNP: Data-driven AttNP without physical priors, generally yielding higher estimation errors in presence of severe noise or external perturbations.
DKF: Deep Kalman Filter integrates deep learning and recursive Bayesian estimation but incurs substantially greater computational and parametric overhead.
PINN-LSTM with Bayesian layer: Incorporates physics constraints and sequential modeling but demonstrates higher RMSE and NLL under external disturbances.
UKF: Hand-tuned Unscented Kalman Filter with fixed noise covariance; lacks adaptive capability and data-driven correction.

Comparison reveals that PI-AttNP’s hybridization of attentive latent encoding, physical prior integration, and formal uncertainty quantification results in superior estimation accuracy, robustness to uncertainty, and computational efficiency. CP-driven interval calibration offers formal guarantees not available in uncalibrated approaches.

6. Implications and Future Directions

The PI-AttNP framework embodies the synergy between data-driven latent representation learning, attentive context selection, explicit physics prior utilization, and rigorous uncertainty quantification via conformal prediction. Potential implications include:

Extension to domains where physics-based models are available but incomplete, including process control, robotics, autonomous vehicles, and multi-physics simulations.
Deployment in settings demanding real-time inference with reliable error margins, especially for safety-critical applications where uncertainty must be tightly controlled.
The approach provides a template for fusing mechanistic modeling and deep learning, advancing grey-box strategies for nonlinear system estimation.

A plausible implication is that PI-AttNP offers not only enhanced prediction capabilities but also a principled pathway to integrate physical knowledge with machine learning in a modular, interpretable manner. The explicit treatment of uncertainty is both foundational for reliable deployment and extensible to broader scientific machine learning contexts.

PDF Markdown Chat (Pro)

References (1)

Hybrid State Estimation of Uncertain Nonlinear Dynamics Using Neural Processes (2025)

Follow Topic

Get notified by email when new papers are published related to Physics-Informed Attentive Neural Process (PI-AttNP).