Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science (2403.12636v2)

Published 19 Mar 2024 in cs.LG and stat.ML

Abstract: Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular sample-based statistical distances, requiring only foundational knowledge in mathematics and statistics. We focus on four commonly used notions of statistical distances representing different methodologies: Using low-dimensional projections (Sliced-Wasserstein; SW), obtaining a distance using classifiers (Classifier Two-Sample Tests; C2ST), using embeddings through kernels (Maximum Mean Discrepancy; MMD), or neural networks (Fr\'echet Inception Distance; FID). We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls. To demonstrate how these distances are used in practice, we evaluate generative models from different scientific domains, namely a model of decision-making and a model generating medical images. We showcase that distinct distances can give different results on similar data. Through this guide, we aim to help researchers to use, interpret, and evaluate statistical distances for generative models in science.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (126)
  1. Maximum mean discrepancy gradient flow. arXiv preprint arXiv:1906.04370, 2019.
  2. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, 2017.
  3. A note on the inception score. arXiv preprint arXiv:1801.01973, 2018.
  4. Michéle Basseville. Divergence measures for statistical data processing - an annotated bibliography. Signal Processing, 2013.
  5. On parameter estimation with the Wasserstein distance. Information and Inference: A journal of the IMA, 2019.
  6. A study on the evaluation of generative models. arXiv preprint arXiv:2206.10935, 2022.
  7. Optimally-weighted estimators of the maximum mean discrepancy for likelihood-free inference. In Proceedings of the 40th International Conference on Machine Learning, 2023.
  8. MMD-Fuse: Learning and combining kernels for two-sample testing without data splitting. In Thirty-Seventh Conference on Neural Information Processing Systems, 2023.
  9. Christopher M Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
  10. Demystifying MMD GANs. arXiv preprint arXiv:1801.01401, 2021.
  11. euMMD: Efficiently computing the MMD two-sample test statistic for univariate data. Statistics and Computing, 2023.
  12. Deep generative modelling: A comparative review of VAEs, GANs, normalizing flows, energy-based and autoregressive models. IEEE transactions on pattern analysis and machine intelligence, 2021.
  13. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 2006.
  14. Ali Borji. Pros and cons of GAN evaluation measures. Computer Vision and Image Understanding, 2019.
  15. Time series analysis: forecasting and control. John Wiley & Sons, 2015.
  16. Tractable dendritic RNNs for reconstructing nonlinear dynamical systems. arXiv preprint arXiv:2207.02542, 2022.
  17. Statistical inference for generative models with maximum mean discrepancy. arXiv preprint arXiv:1906.05944, 2019.
  18. Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020.
  19. Distances between probability distributions of different dimensions. IEEE Transactions on Information Theory, 2022.
  20. The Wasserstein-Fourier distance for stationary time series. arXiv preprint arXiv:1912.05509, 2020.
  21. Classification logit two-sample testing by neural networks for differentiating near manifold densities. IEEE Transactions on Information Theory, 2022.
  22. Neural tangent kernel maximum mean discrepancy. In Advances in Neural Information Processing Systems, 2021.
  23. Effectively unbiased FID and inception score and where to find them. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020.
  24. Monographs on statistics and applied probability. Springer, 1984.
  25. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 2009.
  26. Generative modeling using the Sliced-Wasserstein distance. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
  27. Max-Sliced Wasserstein distance and its use for GANs. arXiv preprint arXiv:1904.05877, 2019.
  28. GENIE: Higher-order denoising diffusion solvers. In Advances in Neural Information Processing Systems, 2022.
  29. Nonparametric generative modeling with conditional Sliced-Wasserstein flows. arXiv preprint arXiv:2305.02164, 2023.
  30. Reconstructing computational system dynamics from neural data with recurrent neural networks. Nature Reviews Neuroscience, 2023.
  31. Training generative neural networks via maximum mean discrepancy optimization. arXiv preprint arXiv:1505.03906, 2015.
  32. Jerome H. Friedman. On multivariate goodness of fit and two sample testing. eConf, 2003.
  33. Characteristic kernels on groups and semigroups. In Advances in Neural Information Processing Systems, 2008.
  34. Maximum mean discrepancy test is aware of adversarial attacks. In Proceedings of the 38th International Conference on Machine Learning, 2021.
  35. Thomas Gärtner. A survey of kernels for structured data. ACM SIGKDD explorations newsletter, 2003.
  36. How sensitive is the human visual system to the local statistics of natural images? PLOS Computational Biology, 2013.
  37. On choosing and bounding probability metrics. International Statistical Review / Revue Internationale de Statistique, 2002.
  38. Sliced mutual information: A scalable measure of statistical dependence. Advances in Neural Information Processing Systems, 2021.
  39. Generative adversarial nets. Advances in Neural Information Processing Systems, 2014.
  40. A kernel two-sample test. Journal of Machine Learning Research, 2012a.
  41. Optimal kernel choice for large-scale two-sample tests. Advances in Neural Information Processing Systems, 2012b.
  42. Generative sliced MMD flows with Riesz kernels. arXiv preprint arXiv:2305.11463, 2023.
  43. Generalized teacher forcing for learning chaotic dynamics. arXiv preprint arXiv:2306.04406, 2023.
  44. GANs trained by a two time-scale update rule converge to a nash equilibrium. CoRR, 2017.
  45. FID score for PyTorch, 2018a.
  46. Two time-scale update rule for training GANs, 2018b.
  47. A fast learning algorithm for deep belief nets. Neural computation, 2006.
  48. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 2020.
  49. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. The journal of Physiology, 1952.
  50. HyperSINDy: Deep generative modeling of nonlinear stochastic governing equations. arXiv preprint arXiv:2310.04832, 2023.
  51. Rethinking FID: Towards a better evaluation metric for image generation. arXiv preprint arXiv:2401.09603, 2023.
  52. PSA-GAN: Progressive self attention GANs for synthetic time series. In International Conference on Learning Representations, 2021.
  53. Interpretable distribution features with maximum testing power. In Advances in Neural Information Processing Systems, 2016.
  54. Global and local two-sample tests via regression. arXiv preprint arXiv:1812.08927, 2019.
  55. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
  56. Auto-encoding variational Bayes. In International Conference on Learning Representations, 2014.
  57. Generalized Sliced-Wasserstein distances. Advances in Neural Information Processing Systems, 2019.
  58. A witness two-sample test. In International Conference on Artificial Intelligence and Statistics, 2022a.
  59. Automl two-sample test. In Advances in Neural Information Processing Systems, 2022b.
  60. H. W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 1955.
  61. The role of imagenet classes in Fréchet inception distance. arXiv preprint arXiv:2203.06026, 2022.
  62. The spectrum kernel: A string kernel for svm protein classification. In Biocomputing, 2001.
  63. MMD-GAN: Towards deeper understanding of moment matching network. Advances in Neural Information Processing Systems, 2017.
  64. Generative moment matching networks. arXiv preprint arXiv:1502.02761, 2015.
  65. Learning deep kernels for non-parametric two-sample tests. In Proceedings of the 37th International Conference on Machine Learning, 2020.
  66. Sliced-Wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. arXiv preprint arXiv:1806.08141, 2019.
  67. Statistical model criticism using kernel two sample tests. In Advances in Neural Information Processing Systems, 2015.
  68. Text classification using string kernels. Journal of Machine Learning Research, 2002.
  69. Revisiting classifier two-sample tests. arXiv preprint arXiv:1610.06545, 2016.
  70. Benchmarking simulation-based inference. In International Conference on Artificial Intelligence and Statistics, 2021.
  71. RePaint: Inpainting using denoising diffusion probabilistic models. arXiv preprint arXiv:2201.09865, 2022.
  72. Evaluating the feasibility of using generative models to generate chest X-ray data. arXiv preprint arXiv:2305.18927, 2023.
  73. Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications, 2020.
  74. Backpropagating through fréchet inception distance. arXiv preprint arXiv:2009.14075, 2021.
  75. George A Miller. WordNet: a lexical database for english. Communications of the ACM, 1995.
  76. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 2017.
  77. Meinard Müller. Dynamic time warping. Information retrieval for music and motion, 2007.
  78. Kimia Nadjahi. Sliced-Wasserstein distance for large-scale machine learning: theory, methodology and extensions. PhD thesis, Institut polytechnique de Paris, 2021.
  79. Statistical and topological properties of sliced probability divergences. Advances in Neural Information Processing Systems, 2020.
  80. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
  81. Statistical aspects of Wasserstein distances. Annual review of statistics and its application, 2019.
  82. E-valuating classifier two-sample tests. arXiv preprint arXiv:2210.13027, 2022.
  83. Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 2021.
  84. On aliased resizing and surprising subtleties in GAN evaluation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  85. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2011.
  86. Computational optimal transport. Center for Research in Economics and Statistics Working Papers, 2017.
  87. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature, 2008.
  88. Improving language understanding by generative pre-training. OpenAI blog, 2018.
  89. Language models are unsupervised multitask learners. OpenAI blog, 2019.
  90. Learning transferable visual models from natural language supervision. International Conference on Machine Learning, 2021.
  91. Sebastian Raschka. An overview of general performance metrics of binary classifier systems. arXiv preprint arXiv:1410.5330, 2014.
  92. Roger Ratcliff. A theory of memory retrieval. Psychological review, 1978.
  93. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, 2014.
  94. Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. Journal of Neuroscience, 2002.
  95. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022.
  96. Data augmentation for learning predictive models on EEG: a systematic comparison. journal of Neural Engineering, 2022.
  97. Improved techniques for training GANs. arXiv preprint arXiv:1606.03498, 2016.
  98. MMD aggregated two-sample test. Journal of Machine Learning Research, 2023.
  99. Evaluating the clinical realism of synthetic chest X-rays generated using progressively growing GANs. SN Computer Science, 2021.
  100. A flexible framework for simulating and fitting generalized drift-diffusion models. ELife, 2020.
  101. Conditional GAN for timeseries generation. arXiv preprint arXiv:2006.16477, 2020.
  102. Physical models of galaxy formation in a cosmological framework. Annual Review of Astronomy and Astrophysics, 2015.
  103. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  104. Kernel choice and classifiability for RKHS embeddings of probability distributions. In Advances in Neural Information Processing Systems, 2009.
  105. Universality, characteristic kernels and RKHS embedding of measures. Journal of Machine Learning Research, 2011.
  106. Spontaneous behaviors drive multidimensional, brainwide activity. Science, 2019.
  107. Danica J. Sutherland. Maximum mean discrepancy (distance distribution). Cross Validated, 2019. URL https://stats.stackexchange.com/q/276618.
  108. Generative models and model criticism via optimized maximum mean discrepancy. In International Conference on Learning Representations, 2021.
  109. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015a.
  110. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567, 2015b.
  111. sbi: A toolkit for simulation-based inference. journal of Open Source Software, 2020.
  112. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844, 2016.
  113. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nature Biomedical Engineering, 2022.
  114. MegaSyn: integrating generative molecular design, automated analog designer, and synthetic viability prediction. ACS omega, 2022.
  115. Needle in a haystack, fast: Benchmarking image perceptual similarity metrics at scale. arXiv preprint arXiv:2206.00282, 2022.
  116. Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems, 2016.
  117. Generating realistic neurophysiological time series with denoising diffusion probabilistic models. bioRxiv, 2023.
  118. Sourcerer: Sample-based maximum entropy source distribution estimation. arXiv preprint arXiv:2402.07808, 2024.
  119. Graph kernels. Journal of Machine Learning Research, 2010.
  120. ChestX-Ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  121. Sliced-Wasserstein generative models. arXiv preprint arXiv:1706.02631, 2019.
  122. An empirical study on evaluation metrics of generative adversarial networks. arXiv preprint arXiv:1806.07755, 2018.
  123. Generative pre-trained transformer: A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. arXiv preprint arXiv:2305.10435, 2023.
  124. Sliced-Wasserstein variational inference. In Proceedings of The 14th Asian Conference on Machine Learning, 2023.
  125. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
  126. Ji Zhao and Deyu Meng. FastMMD: Ensemble of circular discrepancy for efficient two-sample test. Neural Computation, 2015.
Citations (9)

Summary

We haven't generated a summary for this paper yet.