Addressing Misspecification in Simulation-based Inference through Data-driven Calibration (2405.08719v1)

Published 14 May 2024 in stat.ML, cs.LG, and stat.ME

Abstract: Driven by steady progress in generative modeling, simulation-based inference (SBI) has enabled inference over stochastic simulators. However, recent work has demonstrated that model misspecification can harm SBI's reliability. This work introduces robust posterior estimation (ROPE), a framework that overcomes model misspecification with a small real-world calibration set of ground truth parameter measurements. We formalize the misspecification gap as the solution of an optimal transport problem between learned representations of real-world and simulated observations. Assuming the prior distribution over the parameters of interest is known and well-specified, our method offers a controllable balance between calibrated uncertainty and informative inference under all possible misspecifications of the simulator. Our empirical results on four synthetic tasks and two real-world problems demonstrate that ROPE outperforms baselines and consistently returns informative and calibrated credible intervals.

Citations (2)

View on Semantic Scholar

Summary

The paper introduces ROPE, a novel framework that mitigates simulation misspecification by calibrating posterior estimates with real-world data.
It integrates neural posterior estimation with optimal transport theory using entropic regularization to align simulated and observed data representations.
Empirical tests on synthetic and real-world benchmarks demonstrate ROPE’s superior performance in providing calibrated credible intervals.

Addressing Misspecification in Simulation-based Inference through Data-driven Calibration

This paper presents an advanced framework aimed at enhancing the reliability of simulation-based inference (SBI) methods, particularly under circumstances of model misspecification. Current methods in SBI often falter when the stochastic simulators inaccurately reflect the underlying reality they seek to model. The paper introduces a method termed robust posterior estimation (ROPE), which strategically employs a data-driven calibration set composed of real-world observations to mitigate misspecifications in the simulation process.

The authors formalize the misspecification gap by leveraging the mathematical formulation of optimal transport problems. This innovative approach measures the difference between the learned representations of real-world and simulated observations. The underlying assumption in this method is that the prior distribution over the parameters is well-specified and known. ROPE thus provides a balance between calibrated uncertainty and informative inferences across different scenarios of simulator misspecifications.

Key contributions of the paper include the introduction of a novel algorithmic solution that integrates neural posterior estimation (NPE) with optimal transport (OT) theory. This integration models the misspecification as an OT coupling between real and simulated data. Empirical evaluation against existing SBI benchmarks and real-world scenarios demonstrates that ROPE consistently outperforms traditional methods by returning informative and calibrated credible intervals.

The robustness of ROPE is shown across four synthetic tasks as well as two real-world problems: these cover a diverse range of applications and types of model misspecification. Synthetic benchmarks include tasks such as the modeling of epidemics and pendulum dynamics, while real-world benchmarks involve complex systems like light and wind tunnels. The paper thus not only expands on existing benchmarks for SBI under misspecified models but also provides practical methodologies for future empirical analyses.

The choice of the misspecification modeling via OT reflects a strategic advancement, allowing the framework to estimate posterior distributions from the simulator outputs that are realigned with real-world observations through an optimally learned coupling. The use of entropic regularization in OT emerges as an essential element of the method, providing practitioners control over the calibration/informativeness trade-off in the posteriors.

This research carries significant practical and theoretical implications. Practically, it offers a more reliable tool for scientists and engineers modeling complex systems where perfect simulators are rarely feasible. Theoretically, it addresses and expands the landscape of SBI by structuring a sophisticated approach to bridging simulation imperfections with real-world data.

Future directions of exploration and application may involve examining the effects of prior distribution misspecification and extending the methods to high-dimensional parameter spaces where complications arise from data sparsity and the curse of dimensionality. Additionally, other promising routes may involve integrating the framework with noisy or incomplete real-world datasets, enhancing its robustness and applicability.

This work represents a substantive advancement in addressing fundamental challenges within SBI, paving the way for more nuanced and reliable scientific modeling and inference.

PDF Markdown

Related Papers

Tweets

https://twitter.com/WehenkelAntoine/status/1849751042822484299

https://twitter.com/StatMLPapers/status/1790595152941097404

https://twitter.com/statCOpapers/status/1790940762022285496