What makes an image realistic? (2403.04493v4)
Abstract: The last decade has seen tremendous progress in our ability to generate realistic-looking data, be it images, text, audio, or video. Here, we discuss the closely related problem of quantifying realism, that is, designing functions that can reliably tell realistic data from unrealistic data. This problem turns out to be significantly harder to solve and remains poorly understood, despite its prevalence in machine learning and recent breakthroughs in generative AI. Drawing on insights from algorithmic information theory, we discuss why this problem is challenging, why a good generative model alone is insufficient to solve it, and what a good solution would look like. In particular, we introduce the notion of a universal critic, which unlike adversarial critics does not require adversarial training. While universal critics are not immediately practical, they can serve both as a North Star for guiding practical implementations and as a tool for analyzing existing attempts to capture realism.
- A Sandwich Proof of the Shannon-McMillan-Breiman Theorem. The Annals of Probability, 16:899––909, 1988.
- Understanding and simplifying perceptual distances. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12226–12235, June 2021.
- Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing, 15(2):339–353, 2021. doi: 10.1109/JSTSP.2020.3034501.
- J. Besag. Comments on ‘Representations of knowledge in complex systems’ by U. Grenander and M. I. Miller. Journal of the Royal Statistical Society, Series B, 56:591–592, 1994.
- Y. Blau and T. Michaeli. The perception-distortion tradeoff. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6228–6237, 2018.
- Y. Blau and T. Michaeli. Rethinking lossy compression: The rate-distortion-perception tradeoff. In International Conference on Machine Learning, 2019.
- Towards image compression with perfect realism at ultra-low bitrates. arXiv preprint arXiv:2310.10325, 2023.
- Gregory. J. Chaitin. Algorithmic Information Theory. Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, 1987.
- Gregory J. Chaitin. Exploring RANDOMNESS. Discrete Mathematics and Theoretical Computer Science. Springer London, 2001. ISBN 9781852334178.
- WAIC, but Why? Generative Ensembles for Robust Anomaly Detection, 2019.
- Elements of Information Theory, volume 2. John Wiley & Sons, 2006.
- Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
- Diffusion models beat GANs on image synthesis. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
- Sander Dieleman. Musings on typicality, 2020.
- Comparison of Full-Reference Image Quality Models for Optimization of Image Processing Systems. International Journal of Computer Vision, (129):1258–1281, 2021.
- Training generative neural networks via maximum mean discrepancy optimization. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI’15, page 258–267, Arlington, Virginia, USA, 2015. AUAI Press. ISBN 9780996643108.
- Image visual realism: From human perception to machine computation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(9):2180–2193, 2018. doi: 10.1109/TPAMI.2017.2747150.
- Perceptual quality assessment of smartphone photography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3677–3686, 2020.
- KALE flow: A relaxed KL gradient flow for probabilities with disjoint support. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
- Generative Adversarial Nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27, 2014.
- Diffusion models as plug-and-play priors. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- A kernel two-sample test. Journal of Machine Learning Research, 13(25):723–773, 2012.
- From algorithmic to subjective randomness. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems, volume 16. MIT Press, 2003.
- Deep anomaly detection with outlier exposure. In International Conference on Learning Representations, 2019.
- NoRM: no-reference image quality metric for realistic image synthesis. Computer Graphics Forum, 31(2):545–554, 2012. doi: 10.1111/j.1467-8659.2012.03055.x.
- GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems, volume 30, 2017.
- Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
- The Bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosciences, 27(12):712–719, 2004. ISSN 0166-2236. doi: https://doi.org/10.1016/j.tins.2004.10.007.
- Perfect density models cannot guarantee anomaly detection. Entropy, 23(12), 2021. ISSN 1099-4300. doi: 10.3390/e23121690.
- MNIST handwritten digit database. 2010.
- Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In CVPR, 2017.
- An Introduction to Kolmogorov Complexity and Its Applications. Springer, 1997.
- Generative moment matching networks. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1718–1727, Lille, France, 07–09 Jul 2015. PMLR.
- Interacting Particle Solutions of Fokker–Planck Equations Through Gradient–Log–Density Estimation. Entropy, 22(8), 2020. ISSN 1099-4300. doi: 10.3390/e22080802.
- Per Martin-Löf. The definition of random sequences. Information and Control, 9(6):602–619, 1966. ISSN 0019-9958. doi: https://doi.org/10.1016/S0019-9958(66)80018-9.
- Ryutaroh Matsumoto. Introducing the perception-distortion tradeoff into the rate-distortion theory of general information sources. IEICE Communications Express, advpub:2018XBL0109, 2018. doi: 10.1587/comex.2018XBL0109.
- R. v. Mises. Grundlagen der Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift, 5(1):52–99, 1919. doi: 10.1007/BF01203155.
- Making a “Completely Blind” Image Quality Analyzer. IEEE Signal Processing Letters, 20(3):209–212, March 2013. doi: 10.1109/LSP.2012.2227726.
- Do deep generative models know what they don’t know? In International Conference on Learning Representations, 2019a.
- Detecting out-of-distribution inputs to deep generative models using typicality, 2019b.
- IX. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694-706):289–337, 1933. doi: 10.1098/rsta.1933.0009.
- On surrogate loss functions and f-divergences. The Annals of Statistics, 37(2):876 – 904, 2009. doi: 10.1214/08-AOS595.
- Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010. doi: 10.1109/TIT.2010.2068870.
- f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. In Advances in Neural Information Processing Systems, volume 29, 2016.
- Out-of-distribution detection with reconstruction error and typicality-based penalty. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5540–5552, Los Alamitos, CA, USA, jan 2023. IEEE Computer Society. doi: 10.1109/WACV56688.2023.00551.
- Seeing through the facade: Understanding the realism, expressivity, and limitations of diffusion models. 2023.
- DreamFusion: Text-to-3D using 2D Diffusion. In The Eleventh International Conference on Learning Representations, 2023.
- On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, page 3571–3577. AAAI Press, 2015. ISBN 0262511290.
- Alfréd Rényi. On Measures of Entropy and Information. In The 4th Berkeley Symposium on Mathematics, Statistics and Probability, page 547–561. University of California Press, 1961.
- J. Rissanen. Modeling by shortest data description. Automatica, 14(5):465–471, 1978. ISSN 0005-1098. doi: https://doi.org/10.1016/0005-1098(78)90005-5.
- H. Robbins. An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 157–163, 1956.
- A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5):756–795, 2021. doi: 10.1109/JPROC.2021.3052449.
- On the choice of perception loss function for learned video compression. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Input complexity and out-of-distribution detection with likelihood-based generative models. In International Conference on Learning Representations, 2020.
- DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models, 2023.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR.
- Ray J. Solomonoff. A preliminary report on a general theory of inductive inference. Technical report, Cambridge, MA, 1960.
- R. M. Solovay. Lecture notes. 1975.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
- Amortised map inference for image super-resolution. In International Conference on Learning Representations, 2017.
- L. Theis and E. Agustsson. On the advantages of stochastic encoders. In Neural Compression Workshop at ICLR, 2021.
- L. Theis and A. B. Wagner. A coding theorem for the rate-distortion-perception function. In Neural Compression Workshop at ICLR, 2021.
- A note on the evaluation of generative models. In International Conference on Learning Representations, Apr 2016.
- Parallel WaveNet: Fast high-fidelity speech synthesis. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3918–3926. PMLR, 10–15 Jul 2018.
- Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, page 681–688, Madison, WI, USA, 2011. Omnipress. ISBN 9781450306195.
- An introduction to neural data compression. Foundations and Trends in Computer Graphics and Vision, 15(2):113–200, 2023.
- One-step diffusion with distribution matching distillation. arXiv preprint arXiv:2311.18828, 2023.
- A feature-enriched completely blind image quality evaluator. IEEE Transactions on Image Processing, 24(8):2579–2591, 2015. doi: 10.1109/TIP.2015.2426416.
- Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. In International Conference on Learning Representations, 2018.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.