Neural Fidelity Calibration (NFC)
- Neural Fidelity Calibration (NFC) is a framework for adaptive sim-to-real transfer in deep reinforcement learning, integrating online simulator calibration with residual uncertainty handling.
- It decomposes the sim-to-real gap into calibrated simulator dynamics and residual fidelity, enabling targeted policy fine-tuning and robust anomaly detection in complex robotic environments.
- Its use of conditional score-based diffusion models and sequential calibration strategies has been validated through high-dimensional experiments, demonstrating significant improvements in real-world robotic performance.
Neural Fidelity Calibration (NFC) is a framework for adaptive and informative sim-to-real transfer in deep reinforcement learning for robotic locomotion and navigation. NFC integrates online calibration of simulator physical coefficients and residual fidelity domains via conditional score-based diffusion models. The key innovation is joint inference of physical parameter correction and residual uncertainty attributable to perception limitations, thereby enabling realistic environment sampling and targeted policy adaptation. NFC is designed to perform online calibration during execution, fine-tune policies under detected anomalous scenarios, and manage transition uncertainty through optimistic exploration. Its efficacy has been validated across high-dimensional robotic platforms and challenging real-world environments, notably improving robustness in sim-to-real adaptation (Yu et al., 11 Apr 2025).
1. Formalism of Residual Fidelity
NFC explicitly decomposes the sim-to-real gap into calibrated simulator dynamics and residual fidelity which captures both model misspecification and perception uncertainty. The transition of real-world system state is modeled as: where:
- : Black-box simulator dynamics with calibration parameters .
- : Residual fidelity shift, parameterized by , representing unmodeled dynamics and environmental uncertainty.
- : Zero-mean process noise.
- : Onboard perceived environment state.
Residual fidelity parameters are defined as:
- Residual dynamics:
- Residual environment: , with reconstructed via diffusion.
Thus, the residual fidelity at time is: This joint representation allows the diffusion model to sample plausible physical and perceptual scenarios for subsequent calibration and policy improvement.
2. Conditional Score-Based Diffusion Modeling
NFC uses a conditional score-based diffusion model for inference over the posterior distribution of calibration () and residual () parameters given observed trajectories : The diffusion mechanism involves:
- Forward SDE:
Discrete: , with context set by trajectory and perception.
- Reverse SDE: The backward-time evolution is governed by
The score function is parameterized by a neural network .
- Training Objective:
which is solved via score-matching on synthetic samples.
This design enables NFC to stochastically sample physical and residual domains, adaptively responding to observed sim-to-real discrepancy.
3. Simulator Coefficient Calibration and Policy Adaptation
For each real trajectory , simulator coefficients and residual domains are inferred by: The calibration workflow involves:
- Sampling: Generating samples to produce corresponding environments with updated physical and residual parameters.
- Loss Function: Discrepancy minimization
with Bayesian posterior maximization
- Gradient Update: Parameters are refined via score-matching gradients.
This workflow bridges the sim-to-real gap by inferring optimal physics and environmental corrections for policy fine-tuning in the dynamically reconstructed domain.
4. Sequential Calibration and Online NFC Refinement
NFC employs sequential posterior construction to efficiently utilize accumulated calibration knowledge:
- Prior proposal:
- Posterior refinement:
This process requires retraining only on a compact, online dataset, improving computational efficiency and calibration accuracy. The sequential NFC update algorithm is:
| Step | Operation |
|---|---|
| 1. Sampling | |
| 2. Simulation | Collect , compute |
| 3. Update | Minimize ; set |
This sequential framework maintains NFC adaptivity throughout deployment.
5. Anomalous Scenario Detection and Selective Policy Optimization
NFC includes an anomaly-driven fine-tuning protocol:
- Detection: A causal-TCN encoder encodes control () and target () trajectories. An anomaly is flagged when
with threshold determined via ROC analysis.
- Selective Fine-Tuning: Policy adaptation via
is triggered only under detected anomalous conditions.
This targeted adaptation avoids unnecessary policy updates, focusing computational resources on informative scenarios.
6. Exploration Under Uncertainty via Hallucination
When NFC posterior uncertainty () is large, the framework initiates hallucinated optimistic exploration: where , are posterior mean and covariance, and guides optimistic parameter sampling. The joint policy optimization is: This procedure suggests greater resilience to epistemic uncertainty in sim-to-real adaptation.
7. Evaluation and Empirical Findings
Calibration Accuracy
NFC demonstrates superior calibration precision for sim-to-real transfer. Average log-posterior scores (MDN vs SDE+TCN) for Ant, Quadruped, Humanoid, Quadrotor, and Jackal indicate consistent improvement across flat/rough terrains.
Anomaly Detection Results
Detection performance achieves TPR = 1.00 and FPR = 0.00 across five robotic systems, indicating high reliability for triggering targeted policy optimization.
Sim-Sim Policy Performance
Table A shows PPO+NFC outperforming PPP w/o ΔFidelity, PPO w/o NFC, and TD-MPC2 in normalized return for all benchmark robots:
| Robot | PPO w/ NFC | w/o ΔFidelity | w/o NFC | TD-MPC2 |
|---|---|---|---|---|
| Ant (F) | 0.98 | 0.88 | 0.64 | 0.87 |
| Ant (R) | 0.95 | 0.81 | 0.57 | 0.75 |
| Quad (F) | 0.67 | 0.33 | 0.17 | 0.42 |
| Quad (R) | 0.55 | 0.27 | 0.18 | 0.18 |
| Humanoid(F) | 0.96 | 0.89 | 0.55 | 0.88 |
| Humanoid(R) | 0.93 | 0.90 | 0.58 | 0.88 |
| Quadcopter(F) | 0.82 | 0.75 | 0.67 | 0.72 |
| Quadcopter(R) | 0.81 | 0.69 | 0.33 | 0.67 |
| Jackal (F) | 0.87 | 0.67 | 0.23 | 0.41 |
| Jackal (R) | 0.82 | 0.69 | 0.20 | 0.41 |
Real-World Navigation: ClearPath Jackal
Empirical results for the ClearPath Jackal under flat and rough conditions (broken axle, snow, rocks), summarized in Table B, show NFC-driven PPO achieves highest success rates and lowest orientation/positional jerk under challenging conditions:
| Condition | Method | Success % | Traj-ratio | Ori-jerk | Pos-jerk |
|---|---|---|---|---|---|
| Flat | PPO w/ NFC | 100 | 1.8 | 2.98 | 4.36 |
| Flat | PPO w/o ΔF | 100 | 1.8 | 3.50 | 9.32 |
| Flat | PPO w/o NFC | 100 | 1.9 | 14.18 | 8.41 |
| Flat | TD-MPC2 | 100 | 1.5 | 6.95 | 5.27 |
| Flat | Falco | 100 | 1.6 | 41.9 | 6.95 |
| Rough | PPO w/ NFC | 72 | 2.1 | 3.19 | 6.26 |
| Rough | PPO w/o ΔF | 53 | 3.2 | 8.49 | 12.46 |
| Rough | PPO w/o NFC | 25 | 3.7 | 26.99 | 12.42 |
| Rough | TD-MPC2 | 42 | 2.7 | 8.98 | 18.46 |
| Rough | Falco | 28 | 3.6 | 41.93 | 8.87 |
Taken together, these findings substantiate NFC’s benefits in sim-to-real policy transfer, adaptive calibration, targeted fine-tuning, and robust operation under uncertainty (Yu et al., 11 Apr 2025).