PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation (2402.04355v1)
Abstract: We propose a comprehensive sample-based method for assessing the quality of generative models. The proposed approach enables the estimation of the probability that two sets of samples are drawn from the same distribution, providing a statistically rigorous method for assessing the performance of a single generative model or the comparison of multiple competing models trained on the same dataset. This comparison can be conducted by dividing the space into non-overlapping regions and comparing the number of data samples in each region. The method only requires samples from the generative model and the test data. It is capable of functioning directly on high-dimensional data, obviating the need for dimensionality reduction. Significantly, the proposed method does not depend on assumptions regarding the density of the true distribution, and it does not rely on training or fitting any auxiliary models. Instead, it focuses on approximating the integral of the density (probability mass) across various sub-regions within the data space.
- Gaussian process regression for astronomical time series. Annual Review of Astronomy and Astrophysics, 61:329–371, 2023.
- How faithful is your synthetic data. Sample-Level Metrics for Evaluating and Auditing Generative Models. arXiv, 2022.
- Tests on categorical data from the unionintersection principle. Annals of the Institute of Statistical Mathematics, 26:203–213, 1974.
- Testing equivalence of multinomial distributions—a constrained bootstrap approach. Statistics & Probability Letters, 206:109999, 2024.
- Buchner, J. Nested sampling methods. Statistic Surveys, 17:169–215, 2023.
- Chan, N. H. Time series: applications to finance. John Wiley & Sons, 2004.
- Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
- Sequential monte carlo samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(3):411–436, 2006.
- emcee: the mcmc hammer. Publications of the Astronomical Society of the Pacific, 125(925):306, 2013.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 6840–6851. Curran Associates, Inc., 2020a. URL https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020b.
- The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
- Feature likelihood score: Evaluating generalization of generative models using samples. arXiv preprint arXiv:2302.04440, 2023.
- Auto-Encoding Variational Bayes. arXiv e-prints, art. arXiv:1312.6114, December 2013. doi: 10.48550/arXiv.1312.6114.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
- A theory of continuous generative flow networks. In International Conference on Machine Learning, pp. 18269–18300. PMLR, 2023.
- Perfect density models cannot guarantee anomaly detection. Entropy, 23(12):1690, 2021.
- Sampling-based accuracy testing of posterior estimators for general inference. arXiv preprint arXiv:2302.03026, 2023a.
- Improving gradient-guided nested sampling for posterior inference. arXiv preprint arXiv:2312.03911, 2023b.
- Lin, J. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(1):145–151, 1991.
- Equation of state calculations by fast computing machines. The journal of chemical physics, 21(6):1087–1092, 1953.
- Flow annealed importance sampling bootstrap. arXiv preprint arXiv:2208.01893, 2022.
- Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136, 2018.
- Neal, R. M. Slice sampling. The annals of statistics, 31(3):705–767, 2003.
- Training generative neural samplers using variational divergence minimization. In Proc. Adv. Neural Inf. Process. Syst., pp. 271–279.
- Two-sample test for sparse high-dimensional multinomial distributions. Test, 28:804–826, 2019.
- Rao, C. Karl pearson chi-square test the dawn of statistical inference. Goodness-of-fit tests and model validity, pp. 9–24, 2002.
- Rao, C. R. Tests of significance in multivariate analysis. Biometrika, 35(1/2):58–79, 1948.
- Time-series anomaly detection service at microsoft. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 3009–3017, 2019.
- Variational inference with normalizing flows. In International conference on machine learning, pp. 1530–1538. PMLR, 2015.
- Improved sampling via learned diffusions. arXiv preprint arXiv:2307.01198, 2023.
- Assessing generative models via precision and recall. Advances in neural information processing systems, 31, 2018.
- Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
- Improving gans using optimal transport. arXiv preprint arXiv:1803.05573, 2018.
- Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Applied soft computing, 90:106181, 2020.
- A review of time-series anomaly detection techniques: A step to future perspectives. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 1, pp. 865–877. Springer, 2021.
- Skilling, J. Nested sampling for general bayesian computation. 2006.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=PxTIG12RRHS.
- Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models. arXiv preprint arXiv:2306.04675, 2023.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
- A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844, 2015.
- Empirical analysis of overfitting and mode drop in gan training. In 2020 IEEE International Conference on Image Processing (ICIP), pp. 1651–1655. IEEE, 2020.
- Detecting gravitational waves in data with non-stationary and non-gaussian noise. Physical Review D, 104(6):063034, 2021.
- Zelterman, D. Goodness-of-fit tests for large sparse multinomial distributions. Journal of the American Statistical Association, 82(398):624–629, 1987.
- Path integral sampler: a stochastic control approach for sampling. arXiv preprint arXiv:2111.15141, 2021.