Neural Fidelity Calibration (NFC)

Updated 14 December 2025

Neural Fidelity Calibration (NFC) is a framework for adaptive sim-to-real transfer in deep reinforcement learning, integrating online simulator calibration with residual uncertainty handling.
It decomposes the sim-to-real gap into calibrated simulator dynamics and residual fidelity, enabling targeted policy fine-tuning and robust anomaly detection in complex robotic environments.
Its use of conditional score-based diffusion models and sequential calibration strategies has been validated through high-dimensional experiments, demonstrating significant improvements in real-world robotic performance.

Neural Fidelity Calibration (NFC) is a framework for adaptive and informative sim-to-real transfer in deep reinforcement learning for robotic locomotion and navigation. NFC integrates online calibration of simulator physical coefficients and residual fidelity domains via conditional score-based diffusion models. The key innovation is joint inference of physical parameter correction and residual uncertainty attributable to perception limitations, thereby enabling realistic environment sampling and targeted policy adaptation. NFC is designed to perform online calibration during execution, fine-tune policies under detected anomalous scenarios, and manage transition uncertainty through optimistic exploration. Its efficacy has been validated across high-dimensional robotic platforms and challenging real-world environments, notably improving robustness in sim-to-real adaptation (Yu et al., 11 Apr 2025).

1. Formalism of Residual Fidelity

NFC explicitly decomposes the sim-to-real gap into calibrated simulator dynamics and residual fidelity which captures both model misspecification and perception uncertainty. The transition of real-world system state is modeled as: $s_{t+1}^{\rm real} = f_{\psi}(s_t,a_t\mid e_t) + g_{\phi}(s_t,a_t\mid e_t) + \epsilon_t$ where:

$f_{\psi}$ : Black-box simulator dynamics with calibration parameters $\psi$ .
$g_{\phi}$ : Residual fidelity shift, parameterized by $\phi$ , representing unmodeled dynamics and environmental uncertainty.
$\epsilon_t$ : Zero-mean process noise.
$e_t$ : Onboard perceived environment state.

Residual fidelity parameters $\phi=\{\Delta s, \Delta e\}$ are defined as:

Residual dynamics: $\Delta s_t = s_{t+1}^{\rm real} - f_{\psi}(s_t,a_t)$
Residual environment: $\Delta e_t = \hat e_t - e_t$ , with $\hat e_t$ reconstructed via diffusion.

Thus, the residual fidelity at time $t$ is: $r_t = (\Delta s_t, \Delta e_t) = (s_{t+1}^{\rm real} - f_{\psi}(s_t,a_t), \hat e_t - e_t)$ This joint representation allows the diffusion model to sample plausible physical and perceptual scenarios for subsequent calibration and policy improvement.

2. Conditional Score-Based Diffusion Modeling

NFC uses a conditional score-based diffusion model for inference over the posterior distribution of calibration ( $\psi$ ) and residual ( $\phi$ ) parameters given observed trajectories $\tau$ : $q(\psi,\phi|\tau)$ The diffusion mechanism involves:

Forward SDE:

$\mathrm{d}x = f(x,t)\,\mathrm{d}t + g(t)\,\mathrm{d}w_t$

Discrete: $q(x_t|x_0) = \mathcal{N}(x_t; \alpha_t x_0, \sigma_t^2I)$ , with context $c$ set by trajectory and perception.

Reverse SDE: The backward-time evolution is governed by

$\mathrm{d}x = [f(x,t) - g(t)^2 \nabla_{x_t}\log p_t(x_t|c)]\mathrm{d}t + g(t)\,\mathrm{d}\bar w_t$

The score function $\nabla_{x_t}\log p_t(x_t|c)$ is parameterized by a neural network $s_{\theta}$ .

Training Objective:

$\mathcal{L}(\theta) = \mathbb{E}_{t, x_0, \epsilon} \left\| s_{\theta}(x_t,t|c) - \nabla_{x_t}\log q(x_t|x_0) \right\|^2$

which is solved via score-matching on synthetic samples.

This design enables NFC to stochastically sample physical and residual domains, adaptively responding to observed sim-to-real discrepancy.

3. Simulator Coefficient Calibration and Policy Adaptation

For each real trajectory $\tau$ , simulator coefficients and residual domains are inferred by: $(\psi,\phi) \sim q(\psi,\phi|\tau)$ The calibration workflow involves:

Sampling: Generating $N$ samples $(\psi_i, \phi_i)$ to produce corresponding environments with updated physical and residual parameters.
Loss Function: Discrepancy minimization

$\mathcal{L}_{\rm fidelity} = \mathbb{E}_i \|\tau^{\rm sim}_i - \tau^{\rm real}\|^2$

with Bayesian posterior maximization

$\psi^*,\phi^* = \arg\max_{\psi,\phi} \frac{1}{N}\sum_{i=1}^N \log q(\psi_i,\phi_i|\tau_i^{\rm real})$

Gradient Update: Parameters are refined via score-matching gradients.

This workflow bridges the sim-to-real gap by inferring optimal physics and environmental corrections for policy fine-tuning in the dynamically reconstructed domain.

NFC employs sequential posterior construction to efficiently utilize accumulated calibration knowledge:

Prior proposal: $\tilde p(\psi,\phi) = q_{\rm old}(\psi,\phi|\tau)$
Posterior refinement:

$q_{\rm new}(\psi,\phi|\tau) \propto q_{\rm true}(\psi,\phi|\tau) \cdot \frac{\tilde p(\psi,\phi)}{p(\psi,\phi)}$

This process requires retraining only on a compact, online dataset, improving computational efficiency and calibration accuracy. The sequential NFC update algorithm is:

Step	Operation
1. Sampling	$(\psi_i, \phi_i) \sim q_{\rm old}(\psi,\phi\|\tau_j)$
2. Simulation	Collect $\tau^{\rm sim}_{ij}$ , compute $\ell_{ij} = \\|\tau^{\rm sim}_{ij} - \tau_j\\|^2$
3. Update	Minimize $\mathcal{L}_{\rm fidelity} = \frac{1}{MN}\sum_{j,i} \ell_{ij}$ ; set $q_{\rm new}$

This sequential framework maintains NFC adaptivity throughout deployment.

5. Anomalous Scenario Detection and Selective Policy Optimization

NFC includes an anomaly-driven fine-tuning protocol:

Detection: A causal-TCN encoder ${\cal F}_\theta$ encodes control ( $\tau^{\mathfrak c}$ ) and target ( $\tau^{\mathfrak t}$ ) trajectories. An anomaly is flagged when

$\cos({\cal F}_\theta(\tau^{\mathfrak c}), {\cal F}_\theta(\tau^{\mathfrak t})) < \delta_{\rm anom}$

with threshold determined via ROC analysis.

Selective Fine-Tuning: Policy adaptation via

$\pi^* = \arg\max_{\pi}\; \mathbb{E}_{(\psi,\phi) \sim q}\; \mathbb{E}_{s_{t+1} \sim p_{\psi,\phi}} [\sum_{t=0}^T \gamma^t r(s_t,a_t)]$

is triggered only under detected anomalous conditions.

This targeted adaptation avoids unnecessary policy updates, focusing computational resources on informative scenarios.

6. Exploration Under Uncertainty via Hallucination

When NFC posterior uncertainty ( $\|\Sigma\|$ ) is large, the framework initiates hallucinated optimistic exploration: $(\psi,\phi) \sim \mu(\psi,\phi) + \Sigma(\psi,\phi)\cdot \pi_{\mathfrak h}(s_t)$ where $\mu$ , $\Sigma$ are posterior mean and covariance, and $\pi_{\mathfrak h}$ guides optimistic parameter sampling. The joint policy optimization is: $\pi^*, \pi_{\mathfrak h}^* = \arg\max_{\pi, \pi_{\mathfrak h}} \mathbb{E}_{(\psi,\phi)\sim{\rm hallucinated}} [\sum_{t=0}^T \gamma^t r(s_t,a_t)]$ This procedure suggests greater resilience to epistemic uncertainty in sim-to-real adaptation.

7. Evaluation and Empirical Findings

Calibration Accuracy

NFC demonstrates superior calibration precision for sim-to-real transfer. Average log-posterior scores (MDN vs SDE+TCN) for Ant, Quadruped, Humanoid, Quadrotor, and Jackal indicate consistent improvement across flat/rough terrains.

Anomaly Detection Results

Detection performance achieves TPR = 1.00 and FPR = 0.00 across five robotic systems, indicating high reliability for triggering targeted policy optimization.

Sim-Sim Policy Performance

Table A shows PPO+NFC outperforming PPP w/o ΔFidelity, PPO w/o NFC, and TD-MPC2 in normalized return for all benchmark robots:

Robot	PPO w/ NFC	w/o ΔFidelity	w/o NFC	TD-MPC2
Ant (F)	0.98	0.88	0.64	0.87
Ant (R)	0.95	0.81	0.57	0.75
Quad (F)	0.67	0.33	0.17	0.42
Quad (R)	0.55	0.27	0.18	0.18
Humanoid(F)	0.96	0.89	0.55	0.88
Humanoid(R)	0.93	0.90	0.58	0.88
Quadcopter(F)	0.82	0.75	0.67	0.72
Quadcopter(R)	0.81	0.69	0.33	0.67
Jackal (F)	0.87	0.67	0.23	0.41
Jackal (R)	0.82	0.69	0.20	0.41

Empirical results for the ClearPath Jackal under flat and rough conditions (broken axle, snow, rocks), summarized in Table B, show NFC-driven PPO achieves highest success rates and lowest orientation/positional jerk under challenging conditions:

Condition	Method	Success %	Traj-ratio	Ori-jerk	Pos-jerk
Flat	PPO w/ NFC	100	1.8	2.98	4.36
Flat	PPO w/o ΔF	100	1.8	3.50	9.32
Flat	PPO w/o NFC	100	1.9	14.18	8.41
Flat	TD-MPC2	100	1.5	6.95	5.27
Flat	Falco	100	1.6	41.9	6.95
Rough	PPO w/ NFC	72	2.1	3.19	6.26
Rough	PPO w/o ΔF	53	3.2	8.49	12.46
Rough	PPO w/o NFC	25	3.7	26.99	12.42
Rough	TD-MPC2	42	2.7	8.98	18.46
Rough	Falco	28	3.6	41.93	8.87

Taken together, these findings substantiate NFC’s benefits in sim-to-real policy transfer, adaptive calibration, targeted fine-tuning, and robust operation under uncertainty (Yu et al., 11 Apr 2025).

PDF Markdown Chat (Pro)

References (1)

Neural Fidelity Calibration for Informative Sim-to-Real Adaptation (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Neural Fidelity Calibration (NFC).

Neural Fidelity Calibration (NFC)

1. Formalism of Residual Fidelity

2. Conditional Score-Based Diffusion Modeling

3. Simulator Coefficient Calibration and Policy Adaptation

4. Sequential Calibration and Online NFC Refinement

5. Anomalous Scenario Detection and Selective Policy Optimization

6. Exploration Under Uncertainty via Hallucination

7. Evaluation and Empirical Findings

Calibration Accuracy

Anomaly Detection Results

Sim-Sim Policy Performance

Real-World Navigation: ClearPath Jackal

Whiteboard

Follow Topic

Continue Learning

Neural Fidelity Calibration (NFC)

1. Formalism of Residual Fidelity

2. Conditional Score-Based Diffusion Modeling

3. Simulator Coefficient Calibration and Policy Adaptation

4. Sequential Calibration and Online NFC Refinement

5. Anomalous Scenario Detection and Selective Policy Optimization

6. Exploration Under Uncertainty via Hallucination

7. Evaluation and Empirical Findings

Calibration Accuracy

Anomaly Detection Results

Sim-Sim Policy Performance

Real-World Navigation: ClearPath Jackal

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics