Papers
Topics
Authors
Recent
2000 character limit reached

What makes an image realistic? (2403.04493v4)

Published 7 Mar 2024 in cs.LG and stat.ML

Abstract: The last decade has seen tremendous progress in our ability to generate realistic-looking data, be it images, text, audio, or video. Here, we discuss the closely related problem of quantifying realism, that is, designing functions that can reliably tell realistic data from unrealistic data. This problem turns out to be significantly harder to solve and remains poorly understood, despite its prevalence in machine learning and recent breakthroughs in generative AI. Drawing on insights from algorithmic information theory, we discuss why this problem is challenging, why a good generative model alone is insufficient to solve it, and what a good solution would look like. In particular, we introduce the notion of a universal critic, which unlike adversarial critics does not require adversarial training. While universal critics are not immediately practical, they can serve both as a North Star for guiding practical implementations and as a tool for analyzing existing attempts to capture realism.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. A Sandwich Proof of the Shannon-McMillan-Breiman Theorem. The Annals of Probability, 16:899––909, 1988.
  2. Understanding and simplifying perceptual distances. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12226–12235, June 2021.
  3. Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing, 15(2):339–353, 2021. doi: 10.1109/JSTSP.2020.3034501.
  4. J. Besag. Comments on ‘Representations of knowledge in complex systems’ by U. Grenander and M. I. Miller. Journal of the Royal Statistical Society, Series B, 56:591–592, 1994.
  5. Y. Blau and T. Michaeli. The perception-distortion tradeoff. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6228–6237, 2018.
  6. Y. Blau and T. Michaeli. Rethinking lossy compression: The rate-distortion-perception tradeoff. In International Conference on Machine Learning, 2019.
  7. Towards image compression with perfect realism at ultra-low bitrates. arXiv preprint arXiv:2310.10325, 2023.
  8. Gregory. J. Chaitin. Algorithmic Information Theory. Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, 1987.
  9. Gregory J. Chaitin. Exploring RANDOMNESS. Discrete Mathematics and Theoretical Computer Science. Springer London, 2001. ISBN 9781852334178.
  10. WAIC, but Why? Generative Ensembles for Robust Anomaly Detection, 2019.
  11. Elements of Information Theory, volume 2. John Wiley & Sons, 2006.
  12. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
  13. Diffusion models beat GANs on image synthesis. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  14. Sander Dieleman. Musings on typicality, 2020.
  15. Comparison of Full-Reference Image Quality Models for Optimization of Image Processing Systems. International Journal of Computer Vision, (129):1258–1281, 2021.
  16. Training generative neural networks via maximum mean discrepancy optimization. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI’15, page 258–267, Arlington, Virginia, USA, 2015. AUAI Press. ISBN 9780996643108.
  17. Image visual realism: From human perception to machine computation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(9):2180–2193, 2018. doi: 10.1109/TPAMI.2017.2747150.
  18. Perceptual quality assessment of smartphone photography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3677–3686, 2020.
  19. KALE flow: A relaxed KL gradient flow for probabilities with disjoint support. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  20. Generative Adversarial Nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27, 2014.
  21. Diffusion models as plug-and-play priors. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  22. A kernel two-sample test. Journal of Machine Learning Research, 13(25):723–773, 2012.
  23. From algorithmic to subjective randomness. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems, volume 16. MIT Press, 2003.
  24. Deep anomaly detection with outlier exposure. In International Conference on Learning Representations, 2019.
  25. NoRM: no-reference image quality metric for realistic image synthesis. Computer Graphics Forum, 31(2):545–554, 2012. doi: 10.1111/j.1467-8659.2012.03055.x.
  26. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems, volume 30, 2017.
  27. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
  28. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosciences, 27(12):712–719, 2004. ISSN 0166-2236. doi: https://doi.org/10.1016/j.tins.2004.10.007.
  29. Perfect density models cannot guarantee anomaly detection. Entropy, 23(12), 2021. ISSN 1099-4300. doi: 10.3390/e23121690.
  30. MNIST handwritten digit database. 2010.
  31. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In CVPR, 2017.
  32. An Introduction to Kolmogorov Complexity and Its Applications. Springer, 1997.
  33. Generative moment matching networks. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1718–1727, Lille, France, 07–09 Jul 2015. PMLR.
  34. Interacting Particle Solutions of Fokker–Planck Equations Through Gradient–Log–Density Estimation. Entropy, 22(8), 2020. ISSN 1099-4300. doi: 10.3390/e22080802.
  35. Per Martin-Löf. The definition of random sequences. Information and Control, 9(6):602–619, 1966. ISSN 0019-9958. doi: https://doi.org/10.1016/S0019-9958(66)80018-9.
  36. Ryutaroh Matsumoto. Introducing the perception-distortion tradeoff into the rate-distortion theory of general information sources. IEICE Communications Express, advpub:2018XBL0109, 2018. doi: 10.1587/comex.2018XBL0109.
  37. R. v. Mises. Grundlagen der Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift, 5(1):52–99, 1919. doi: 10.1007/BF01203155.
  38. Making a “Completely Blind” Image Quality Analyzer. IEEE Signal Processing Letters, 20(3):209–212, March 2013. doi: 10.1109/LSP.2012.2227726.
  39. Do deep generative models know what they don’t know? In International Conference on Learning Representations, 2019a.
  40. Detecting out-of-distribution inputs to deep generative models using typicality, 2019b.
  41. IX. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694-706):289–337, 1933. doi: 10.1098/rsta.1933.0009.
  42. On surrogate loss functions and f-divergences. The Annals of Statistics, 37(2):876 – 904, 2009. doi: 10.1214/08-AOS595.
  43. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010. doi: 10.1109/TIT.2010.2068870.
  44. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. In Advances in Neural Information Processing Systems, volume 29, 2016.
  45. Out-of-distribution detection with reconstruction error and typicality-based penalty. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5540–5552, Los Alamitos, CA, USA, jan 2023. IEEE Computer Society. doi: 10.1109/WACV56688.2023.00551.
  46. Seeing through the facade: Understanding the realism, expressivity, and limitations of diffusion models. 2023.
  47. DreamFusion: Text-to-3D using 2D Diffusion. In The Eleventh International Conference on Learning Representations, 2023.
  48. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, page 3571–3577. AAAI Press, 2015. ISBN 0262511290.
  49. Alfréd Rényi. On Measures of Entropy and Information. In The 4th Berkeley Symposium on Mathematics, Statistics and Probability, page 547–561. University of California Press, 1961.
  50. J. Rissanen. Modeling by shortest data description. Automatica, 14(5):465–471, 1978. ISSN 0005-1098. doi: https://doi.org/10.1016/0005-1098(78)90005-5.
  51. H. Robbins. An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 157–163, 1956.
  52. A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5):756–795, 2021. doi: 10.1109/JPROC.2021.3052449.
  53. On the choice of perception loss function for learned video compression. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  54. Input complexity and out-of-distribution detection with likelihood-based generative models. In International Conference on Learning Representations, 2020.
  55. DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models, 2023.
  56. Deep unsupervised learning using nonequilibrium thermodynamics. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR.
  57. Ray J. Solomonoff. A preliminary report on a general theory of inductive inference. Technical report, Cambridge, MA, 1960.
  58. R. M. Solovay. Lecture notes. 1975.
  59. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  60. Amortised map inference for image super-resolution. In International Conference on Learning Representations, 2017.
  61. L. Theis and E. Agustsson. On the advantages of stochastic encoders. In Neural Compression Workshop at ICLR, 2021.
  62. L. Theis and A. B. Wagner. A coding theorem for the rate-distortion-perception function. In Neural Compression Workshop at ICLR, 2021.
  63. A note on the evaluation of generative models. In International Conference on Learning Representations, Apr 2016.
  64. Parallel WaveNet: Fast high-fidelity speech synthesis. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3918–3926. PMLR, 10–15 Jul 2018.
  65. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, page 681–688, Madison, WI, USA, 2011. Omnipress. ISBN 9781450306195.
  66. An introduction to neural data compression. Foundations and Trends in Computer Graphics and Vision, 15(2):113–200, 2023.
  67. One-step diffusion with distribution matching distillation. arXiv preprint arXiv:2311.18828, 2023.
  68. A feature-enriched completely blind image quality evaluator. IEEE Transactions on Image Processing, 24(8):2579–2591, 2015. doi: 10.1109/TIP.2015.2426416.
  69. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. In International Conference on Learning Representations, 2018.
Citations (1)

Summary

  • The paper presents the universal critic as a novel method to assess image realism by linking randomness deficiency with Kolmogorov complexity.
  • It critiques conventional approaches like probability and weak typicality, demonstrating their limitations in capturing the true essence of realism.
  • The study outlines practical implications for applications such as anomaly detection and deepfake identification in generative AI.

Exploring the Foundations of Realism in Generative Models: Introducing the Universal Critic

Introduction to Realism in Generative Models

The field of generative models has made significant advancements in producing data that increasingly blurs the line between real and synthetic. However, a foundational question remains partially unanswered: "What makes data, particularly images, realistic?" This question not only intrigues but challenges researchers, given its inherent subjective nature and the absence of a widely accepted formal definition of realism. This paper explores the intricacies of quantifying realism, examining why prevailing methods fall short and proposing a new conceptual framework, termed the universal critic.

Limitations of Existing Approaches

The quest to quantify realism has traditionally revolved around concepts like probability and typicality, both of which have shown considerable limitations:

  • Probability: The initial thought might be to associate high probability values of data under a specific model with realism. However, this approach fails to capture the essence of realism, as demonstrated through examples where highly probable outputs under a model may not necessarily be realistic.
  • Weak Typicality: Expanding on probability, weak typicality introduces the concept that "realistic" data samples should not deviate significantly in probability from the model's average. While this notion aligns closely with how humans perceive data, it's insufficient for defining realism comprehensively since the typical set may include both realistic and unrealistic examples.

These approaches manifest a critical oversight: they quantify the likelihood of data assuming it conforms to a specific model, not whether the model accurately represents the data's source of realism.

A Novel Perspective: Universal Critics

The paper introduces the universal critic as a theoretical construct capable of addressing the shortcomings of probability and typicality-based approaches. Rooted in algorithmic information theory, the universal critic evaluates data based on its randomness deficiency, correlating more closely with the intuitive understanding of realism. The critique articulated revolves around two primary equations that bind the likelihood of data under a model with its Kolmogorov complexity, encapsulating the essence of what makes data seem realistic.

Practical Implications and Theoretical Significance

The universal critic holds significant potential for both theoretical exploration and practical application in fields like anomaly detection, deepfake identification, and generative model evaluation. It aligns with human perception by considering data realism as a plausibility measure against a specific data distribution model. Despite its current theoretical nature and challenges in practical implementation, the universal critic concept could significantly influence future research in generative AI by providing a robust framework for evaluating realism.

Future Directions

Looking ahead, a critical area of exploration lies in operationalizing the universal critic for real-world applications. This involves finding approximate but effective measures of randomness deficiency that can be practically applied to evaluate and enhance the realism of generatively produced data. Moreover, the notion of batched universal critics opens up avenues for developing more nuanced evaluations that incorporate multiple data samples, potentially offering a pathway to creating more sophisticated and human-like assessment models for generative content.

Conclusion

This paper sets the stage for a reevaluation of how the research community approaches the concept of realism in generative models. By highlighting the limitations of existing methods and proposing the innovative concept of the universal critic, it invites researchers to rethink and refine the foundations upon which future generative AI technologies can be built. The journey towards understanding and quantifying realism is far from over, but this work contributes a pivotal step forward, offering both a critique of the status quo and a beacon towards future exploration.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 16 tweets with 269 likes about this paper.