Parameter uncertainties for imperfect surrogate models in the low-noise regime (2402.01810v5)
Abstract: Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, the loss ignores misspecification, where models are imperfect. Parameter uncertainties from Bayesian regression are thus significantly underestimated and vanish in the large data limit. This is particularly problematic when building models of low-noise, or near-deterministic, calculations, as the main source of uncertainty is neglected. We analyze the generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show posterior distributions must cover every training point to avoid a divergent generalization error and design an ansatz that respects this constraint, which for linear models incurs minimal overhead. This is demonstrated on model problems before application to thousand dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors where existing schemes fail, allowing this important source of uncertainty to be incorporated in computational workflows.
- Managing computational complexity using surrogate models: a critical review. Research in Engineering Design, 31:275–298, 2020.
- Pierre Alquier. User-friendly introduction to pac-bayes bounds. arXiv preprint arXiv:2110.11216, 2021.
- From cp-fft to cp-rnn: Recurrent neural network surrogate model of crystal plasticity. International Journal of Plasticity, 158:103430, 2022.
- Machine learning interatomic potentials as emerging tools for materials science. Advanced Materials, 31(46):1902765, 2019.
- Uncertainty quantification in atomistic modeling of metals and its effect on mesoscale and continuum modeling: A review. JOM, 73:149–163, 2021.
- Pac-bayesian theory meets bayesian inference. Advances in Neural Information Processing Systems, 29, 2016.
- Wassily Hoeffding. Probability inequalities for sums of bounded random variables. The collected works of Wassily Hoeffding, pages 409–426, 1994.
- Uncertainty estimation for molecular dynamics and sampling. The Journal of Chemical Physics, 154(7), 2021.
- An entropy-maximization approach to automated training set generation for interatomic potentials. The Journal of Chemical Physics, 153(9), 2020.
- A view on model misspecification in uncertainty quantification. arXiv preprint arXiv:2210.16938, 2022.
- W. Kohn and L. J. Sham. Self-consistent equations including exchange and correlation effects. Phys. Rev., 140:A1133–A1138, Nov 1965. doi: 10.1103/PhysRev.140.A1133. URL https://link.aps.org/doi/10.1103/PhysRev.140.A1133.
- Deup: Direct epistemic uncertainty prediction. arXiv preprint arXiv:2102.08501, 2021.
- Machine learning surrogate models for prediction of point defect vibrational entropy. Physical Review Materials, 4(6):063802, 2020.
- Uncertainty quantification of artificial neural network based machine learning potentials. In ASME International Mechanical Engineering Congress and Exposition, volume 52170, page V012T11A030. American Society of Mechanical Engineers, 2018.
- Bayesian model selection, the marginal likelihood, and generalization. In International Conference on Machine Learning, pages 14223–14247. PMLR, 2022.
- Prasanta Chandra Mahalanobis. On the generalized distance in statistics. In Procedings of the National Institute of Science of India. National Institute of Science of India, 1936.
- Andres Masegosa. Learning under model misspecification: Applications to variational and ensemble methods. Advances in Neural Information Processing Systems, 33:5479–5491, 2020.
- Training data selection for accuracy and transferability of interatomic potentials. npj Computational Materials, 8(1):189, 2022.
- Pacm-bayes: Narrowing the empirical risk gap in the misspecified bayesian regime. In International Conference on Artificial Intelligence and Statistics, pages 8270–8298. PMLR, 2022.
- Machine-learned multi-system surrogate models for materials prediction. npj Computational Materials, 5(1):51, 2019.
- Uncertainty quantification in scientific machine learning: Methods, metrics, and comparisons. Journal of Computational Physics, 477:111902, 2023.
- Michael E Tipping. Sparse bayesian learning and the relevance vector machine. Journal of machine learning research, 1(Jun):211–244, 2001.
- Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer Publishing Company, Incorporated, 1st edition, 2008. ISBN 0387790519.
- Uncertainty quantification in classical molecular dynamics. Philosophical Transactions of the Royal Society A, 379(2197):20200082, 2021.
- Uncertainty quantification in molecular simulations with dropout neural network potentials. npj computational materials, 6(1):124, 2020.
- A new view of automatic relevance determination. Advances in neural information processing systems, 20, 2007.
- Extending the accuracy of the snap interatomic potential form. The Journal of chemical physics, 148(24), 2018.