Simulation-based Inference for High-dimensional Data using Surjective Sequential Neural Likelihood Estimation (2308.01054v3)
Abstract: Neural likelihood estimation methods for simulation-based inference can suffer from performance degradation when the modeled data is very high-dimensional or lies along a lower-dimensional manifold, which is due to the inability of the density estimator to accurately estimate a density function. We present Surjective Sequential Neural Likelihood (SSNL) estimation, a novel member in the family of methods for simulation-based inference (SBI). SSNL fits a dimensionality-reducing surjective normalizing flow model and uses it as a surrogate likelihood function, which allows for computational inference via Markov chain Monte Carlo or variational Bayes methods. Among other benefits, SSNL avoids the requirement to manually craft summary statistics for inference of high-dimensional data sets, since the lower-dimensional representation is computed simultaneously with learning the likelihood and without additional computational overhead. We evaluate SSNL on a wide variety of experiments, including two challenging real-world examples from the astrophysics and neuroscience literatures, and show that it either outperforms or is on par with state-of-the-art methods, making it an excellent off-the-shelf estimator for SBI for high-dimensional data sets.
- A stochastic version of the Jansen and Rit neural mass model: analysis and numerics. The Journal of Mathematical Neuroscience, 7(1):1–35, 2017.
- A simulated annealing approach to approximate Bayes computations. Statistics and Computing, 25:1217–1232, 2015.
- Learning summary statistics for Bayesian inference with autoencoders. SciPost Physics Core, 5(3):043, 2022.
- Generalized massive optimal data compression. Monthly Notices of the Royal Astronomical Society: Letters, 476(1):L60–L64, 2018.
- Intrinsic dimensionality estimation within tight localities. In Proceedings of the 2019 SIAM international conference on data mining, pp. 181–189. SIAM, 2019.
- Intrinsic dimensionality estimation within tight localities: A theoretical and experimental analysis. arXiv preprint arXiv:2209.14475, 2022.
- The DeepMind JAX Ecosystem, 2020. URL http://github.com/deepmind.
- Scikit-dimension: a python package for intrinsic dimension estimation. Entropy, 23(10):1368, 2021.
- Adaptive approximate Bayesian computation. Biometrika, 96(4):983–990, 2009.
- Efficient identification of informative features in simulation-based inference. In Advances in Neural Information Processing Systems, 2022.
- Betancourt, M. A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:1701.02434, 2017.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- Brehmer, J. Simulation-based inference in particle physics. Nature Reviews Physics, 3(5):305–305, 2021.
- Constraining effective field theories with machine learning. Phys. Rev. Lett., 121:111801, 2018.
- Handbook of Markov Chain Monte Carlo. CRC press, 2011.
- Spectral density-based and measure-preserving ABC for partially observed diffusion processes. an illustration on Hamiltonian SDEs. Statistics and Computing, 30:627–648, 2020.
- Blackjax: A sampling library for JAX, 2023. URL http://github.com/blackjax-devs/blackjax.
- Fluctuations in Babcock-Leighton dynamos. i. period doubling and transition to chaos. The Astrophysical Journal, 619(1):613, 2005.
- Neural ordinary differential equations. In Advances in Neural Information Processing Systems, 2018.
- Neural approximate sufficient statistics for implicit models. In International Conference on Learning Representations, 2021.
- Is learning summary statistics necessary for likelihood-free inference? In International Conference on Machine Learning, 2023.
- Approximating likelihood ratios with calibrated discriminative classifiers. arXiv preprint arXiv:1506.02169, 2015.
- The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48):30055–30062, 2020.
- Normalizing flows across dimensions. In ICML Workshop on Theoretical Foundations and Applications of Deep Generative Models, 2020.
- Sliced iterative normalizing flows. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, 2021.
- Real-time gravitational wave science with neural posterior estimation. Phys. Rev. Lett., 127:241103, 2021.
- Truncated proposals for scalable and hassle-free simulation-based inference. In Advances in Neural Information Processing Systems, 2022.
- Lightning-fast gravitational wave parameter inference through neural amortization. In Workshop on Machine Learning and the Physical Sciences, Advances in Neural Information Processing Systems, 2020.
- Towards reliable simulation-based inference with balanced neural ratio estimation. In Advances in Neural Information Processing Systems, 2022.
- Tensorflow distributions. arXiv preprint arXiv:1711.10604, 2017.
- NICE: non-linear independent components estimation. In Workshop Track, International Conference on Learning Representations, 2015.
- Density estimation using real NVP. In International Conference on Learning Representations, 2017.
- Neural spline flows. In Advances in Neural Information Processing Systems, 2019.
- On contrastive learning for likelihood-free inference. In Proceedings of the 37th International Conference on Machine Learning, 2020.
- Testing the manifold hypothesis. Journal of the American Mathematical Society, 29(4):983–1049, 2016.
- Beta regression for modelling rates and proportions. Journal of applied statistics, 31(7):799–815, 2004.
- Summary statistics and discrepancy measures for approximate bayesian computation via surrogate posteriors. Statistics and Computing, 32(5):85, 2022.
- Visualization in bayesian workflow. Journal of the Royal Statistical Society Series A: Statistics in Society, 182(2):389–402, 2019.
- Compositional score modeling for simulation-based inference. In International Conference on Machine Learning, 2023.
- Inference from iterative simulation using multiple sequences. Statistical Science, 7(4):457 – 472, 1992.
- Made: Masked autoencoder for distribution estimation. In Proceedings of the 32nd International Conference on Machine Learning, 2015.
- Maximum likelihood learning of energy-based models for simulation-based inference. arXiv preprint arXiv:2210.14756, 2022.
- Variational methods for simulation-based inference. In International Conference on Learning Representations, 2022.
- Training deep neural density estimators to identify mechanistic models of neural dynamics. Elife, 9:e56261, 2020.
- Ffjord: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2019.
- Automatic posterior transformation for likelihood-free inference. In Proceedings of the 36th International Conference on Machine Learning, 2019.
- A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
- Pseudo-likelihood inference. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Likelihood-free MCMC with amortized approximate ratio estimators. In Proceedings of the 37th International Conference on Machine Learning, 2020.
- Towards constraining warm dark matter with stellar streams through neural simulation-based inference. Monthly Notices of the Royal Astronomical Society, 507(2), 2021.
- Flow++: Improving flow-based generative models with variational dequantization and architecture design. In International Conference on Machine Learning, 2019.
- The No-U-Turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
- Minimizing the expected posterior entropy yields optimal summary statistics. arXiv preprint arXiv:2206.02340, 2022.
- Jia, H. Simulation-based inference with quantile regression. arXiv preprint arXiv:2401.02413, 2024.
- Kidger, P. On Neural Differential Equations. PhD thesis, University of Oxford, 2021.
- Glow: Generative flow with invertible 1x1 convolutions. Advances in Neural Information Processing Systems, 2018.
- Funnels: Exact maximum likelihood with dimensionality reduction. In Workshop on Bayesian Deep Learning, Advances in Neural Information Processing Systems, 2021.
- Adaptive approximate Bayesian computation for complex models. Computational Statistics, 28(6):2777–2796, 2013.
- L-C2ST: Local diagnostics for posterior approximations in simulation-based inference. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Revisiting classifier two-sample tests. In International Conference on Learning Representations, 2017.
- Flexible statistical inference for mechanistic models of neural dynamics. In Advances in Neural Information Processing Systems, 2017.
- Benchmarking simulation-based inference. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021.
- Contrastive neural ratio estimation. In Advances in Neural Information Processing Systems, 2022.
- Survae flows: Surjections to bridge the gap between vaes and flows. Advances in Neural Information Processing Systems, 2020.
- Score matched neural exponential families for likelihood-free inference. The Journal of Machine Learning Research, 23(1):1745–1815, 2022.
- Fast ε𝜀\varepsilonitalic_ε-free inference of simulation models with Bayesian conditional density estimation. In Advances in Neural Information Processing Systems, volume 29, 2016.
- Masked autoregressive flow for density estimation. In Advances in Neural Information Processing Systems, 2017.
- Sequential neural likelihood: Fast likelihood-free inference with autoregressive flows. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019.
- Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research, 22(1):2617–2680, 2021.
- Population growth of human y chromosomes: a study of y chromosome microsatellites. Molecular Biology and Evolution, 16(12):1791–1798, 1999.
- JANA: Jointly amortized neural approximation of complex bayesian models. In The 39th Conference on Uncertainty in Artificial Intelligence, 2023.
- Using likelihood-free inference to compare evolutionary dynamics of the protein networks of H. pylori and P. falciparum. PLoS Computational Biology, 3(11):e230, 2007.
- HNPE: Leveraging global parameters for neural posterior estimation. In Advances in Neural Information Processing Systems, 2021.
- Applied stochastic differential equations. Cambridge University Press, 2019.
- Consistency models for scalable and fast simulation-based inference. arXiv preprint arXiv:2312.05440, 2023.
- Neural score estimation: Likelihood-free inference with conditional score based diffusion models. In Fifth Symposium on Advances in Approximate Bayesian Inference, 2023.
- Handbook of Approximate Bayesian Computation. CRC Press, 2018.
- Generative models and model criticism via optimized maximum mean discrepancy. In International Conference on Learning Representations, 2017.
- Validating Bayesian inference algorithms with simulation-based calibration. arXiv preprint arXiv:1804.06788, 2018.
- sbi: A toolkit for simulation-based inference. Journal of Open Source Software, 5(52):2505, 2020.
- A note on the evaluation of generative models. In International Conference on Learning Representations, 2016.
- Likelihood-free inference by ratio estimation. Bayesian Analysis, 17(1):1–31, 2022.
- Rank-Normalization, Folding, and Localization: An Improved R^^𝑅\widehat{R}over^ start_ARG italic_R end_ARG for Assessing Convergence of MCMC. Bayesian Analysis, 16(2):667 – 718, 2021.
- Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2):1–305, 2008.
- Flow matching for scalable simulation-based inference. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Discriminative calibration: Check bayesian computation from simulations and flexible classifier. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Simulation based stacking. arXiv preprint arXiv:2310.17009, 2023.
- Comparing distributions by measuring differences that affect decision making. In International Conference on Learning Representations, 2022.