Papers
Topics
Authors
Recent
2000 character limit reached

PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation

Published 6 Feb 2024 in stat.ML, cs.AI, cs.LG, and stat.ME | (2402.04355v2)

Abstract: We propose a likelihood-free method for comparing two distributions given samples from each, with the goal of assessing the quality of generative models. The proposed approach, PQMass, provides a statistically rigorous method for assessing the performance of a single generative model or the comparison of multiple competing models. PQMass divides the sample space into non-overlapping regions and applies chi-squared tests to the number of data samples that fall within each region, giving a p-value that measures the probability that the bin counts derived from two sets of samples are drawn from the same multinomial distribution. PQMass does not depend on assumptions regarding the density of the true distribution, nor does it rely on training or fitting any auxiliary models. We evaluate PQMass on data of various modalities and dimensions, demonstrating its effectiveness in assessing the quality, novelty, and diversity of generated samples. We further show that PQMass scales well to moderately high-dimensional data and thus obviates the need for feature extraction in practical applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Gaussian process regression for astronomical time series. Annual Review of Astronomy and Astrophysics, 61:329–371, 2023.
  2. How faithful is your synthetic data. Sample-Level Metrics for Evaluating and Auditing Generative Models. arXiv, 2022.
  3. Tests on categorical data from the unionintersection principle. Annals of the Institute of Statistical Mathematics, 26:203–213, 1974.
  4. Testing equivalence of multinomial distributions—a constrained bootstrap approach. Statistics & Probability Letters, 206:109999, 2024.
  5. Buchner, J. Nested sampling methods. Statistic Surveys, 17:169–215, 2023.
  6. Chan, N. H. Time series: applications to finance. John Wiley & Sons, 2004.
  7. Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
  8. Sequential monte carlo samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(3):411–436, 2006.
  9. emcee: the mcmc hammer. Publications of the Astronomical Society of the Pacific, 125(925):306, 2013.
  10. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  11. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
  12. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  13. Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  6840–6851. Curran Associates, Inc., 2020a. URL https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf.
  14. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020b.
  15. The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
  16. Feature likelihood score: Evaluating generalization of generative models using samples. arXiv preprint arXiv:2302.04440, 2023.
  17. Auto-Encoding Variational Bayes. arXiv e-prints, art. arXiv:1312.6114, December 2013. doi: 10.48550/arXiv.1312.6114.
  18. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  19. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
  20. A theory of continuous generative flow networks. In International Conference on Machine Learning, pp. 18269–18300. PMLR, 2023.
  21. Perfect density models cannot guarantee anomaly detection. Entropy, 23(12):1690, 2021.
  22. Sampling-based accuracy testing of posterior estimators for general inference. arXiv preprint arXiv:2302.03026, 2023a.
  23. Improving gradient-guided nested sampling for posterior inference. arXiv preprint arXiv:2312.03911, 2023b.
  24. Lin, J. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(1):145–151, 1991.
  25. Equation of state calculations by fast computing machines. The journal of chemical physics, 21(6):1087–1092, 1953.
  26. Flow annealed importance sampling bootstrap. arXiv preprint arXiv:2208.01893, 2022.
  27. Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136, 2018.
  28. Neal, R. M. Slice sampling. The annals of statistics, 31(3):705–767, 2003.
  29. Training generative neural samplers using variational divergence minimization. In Proc. Adv. Neural Inf. Process. Syst., pp.  271–279.
  30. Two-sample test for sparse high-dimensional multinomial distributions. Test, 28:804–826, 2019.
  31. Rao, C. Karl pearson chi-square test the dawn of statistical inference. Goodness-of-fit tests and model validity, pp.  9–24, 2002.
  32. Rao, C. R. Tests of significance in multivariate analysis. Biometrika, 35(1/2):58–79, 1948.
  33. Time-series anomaly detection service at microsoft. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp.  3009–3017, 2019.
  34. Variational inference with normalizing flows. In International conference on machine learning, pp. 1530–1538. PMLR, 2015.
  35. Improved sampling via learned diffusions. arXiv preprint arXiv:2307.01198, 2023.
  36. Assessing generative models via precision and recall. Advances in neural information processing systems, 31, 2018.
  37. Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
  38. Improving gans using optimal transport. arXiv preprint arXiv:1803.05573, 2018.
  39. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Applied soft computing, 90:106181, 2020.
  40. A review of time-series anomaly detection techniques: A step to future perspectives. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 1, pp.  865–877. Springer, 2021.
  41. Skilling, J. Nested sampling for general bayesian computation. 2006.
  42. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=PxTIG12RRHS.
  43. Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models. arXiv preprint arXiv:2306.04675, 2023.
  44. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  2818–2826, 2016.
  45. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844, 2015.
  46. Empirical analysis of overfitting and mode drop in gan training. In 2020 IEEE International Conference on Image Processing (ICIP), pp.  1651–1655. IEEE, 2020.
  47. Detecting gravitational waves in data with non-stationary and non-gaussian noise. Physical Review D, 104(6):063034, 2021.
  48. Zelterman, D. Goodness-of-fit tests for large sparse multinomial distributions. Journal of the American Statistical Association, 82(398):624–629, 1987.
  49. Path integral sampler: a stochastic control approach for sampling. arXiv preprint arXiv:2111.15141, 2021.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 20 likes about this paper.