A Phase Transition in Diffusion Models Reveals the Hierarchical Nature of Data (2402.16991v2)
Abstract: Understanding the structure of real data is paramount in advancing modern deep-learning methodologies. Natural data such as images are believed to be composed of features organised in a hierarchical and combinatorial manner, which neural networks capture during learning. Recent advancements show that diffusion models can generate high-quality images, hinting at their ability to capture this underlying structure. We study this phenomenon in a hierarchical generative model of data. We find that the backward diffusion process acting after a time $t$ is governed by a phase transition at some threshold time, where the probability of reconstructing high-level features, like the class of an image, suddenly drops. Instead, the reconstruction of low-level features, such as specific details of an image, evolves smoothly across the whole diffusion process. This result implies that at times beyond the transition, the class has changed but the generated sample may still be composed of low-level elements of the initial image. We validate these theoretical insights through numerical experiments on class-unconditional ImageNet diffusion models. Our analysis characterises the relationship between time and scale in diffusion models and puts forward generative models as powerful tools to model combinatorial data properties.
- Backward feature correction: How deep learning performs deep learning. arXiv preprint arXiv:2001.04413, 2020.
- Bach, F. Breaking the curse of dimensionality with convex neural networks. The Journal of Machine Learning Research, 18(1):629–681, 2017.
- Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3), 2023.
- Generative diffusion in very large dimensions. arXiv preprint arXiv:2306.03518, 2023.
- Generative modeling with denoising auto-encoders and langevin sampling. arXiv preprint arXiv:2002.00107, 2020.
- How deep neural networks learn compositional data: The random hierarchy model. arXiv preprint arXiv:2307.02129, 2023.
- Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), pp. 5253–5270, 2023.
- Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. arXiv preprint arXiv:2302.07194, 2023.
- Analysis of learning a flow-based generative model from limited sample complexity. arXiv preprint arXiv:2310.03575, 2023.
- De Bortoli, V. Convergence of denoising diffusion models under the manifold hypothesis. arXiv preprint arXiv:2208.05314, 2022.
- DeGiuli, E. Random language model. Physical Review Letters, 122(12):128301, 2019.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016. doi: 10.1109/CVPR.2016.90.
- Diffusionbert: Improving generative masked language models with diffusion models. arXiv preprint arXiv:2211.15029, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Learning multi-scale local conditional probability models of images. arXiv preprint arXiv:2303.02984, 2023.
- Deep learning. Nature, 521(7553):436, 2015.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986, 2022.
- Distance-based classification with lipschitz functions. The Journal of Machine Learning Research, 5(Jun):669–695, 2004.
- A provably correct algorithm for deep learning that actually works. arXiv preprint arXiv:1803.09522, 2018.
- The implications of local correlation on learning some deep functions. Advances in Neural Information Processing Systems, 33:1322–1332, 2020.
- Deep networks as denoising algorithms: Sample-efficient learning of diffusion models in high-dimensional graphical models. arXiv preprint arXiv:2309.11420, 2023.
- Information, physics, and computation. Oxford University Press, 2009.
- Mossel, E. Reconstruction on trees: beating the second eigenvalue. The Annals of Applied Probability, 11(1):285–300, 2001.
- Mossel, E. Deep learning and hierarchal generative models. arXiv preprint arXiv:1612.09057, 2016.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- Compositional abilities emerge multiplicatively: Exploring diffusion models on a synthetic task. arXiv preprint arXiv:2310.09336, 2023.
- Diffusion models are minimax optimal distribution estimators. arXiv preprint arXiv:2303.01861, 2023.
- Zoom in: An introduction to circuits. Distill, 5(3):e00024–001, 2020.
- Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. International Journal of Automation and Computing, 14(5):503–519, 2017.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
- Handbook of Formal Languages. Springer, January 1997. doi: 10.1007/978-3-642-59126-6.
- Schmidt-Hieber, J. Nonparametric regression using deep neural networks with relu activation function. The Annals of Statistics, 48(4):1875–1897, 2020.
- Learning mixtures of gaussians using the ddpm objective. arXiv preprint arXiv:2307.01178, 2023.
- Failures of gradient-based deep learning. In International Conference on Machine Learning, pp. 3067–3075. PMLR, 2017.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR, 2015.
- Diffusion art or digital forgery. Investigating Data Replication in Diffusion Models, 2022.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- TorchVision maintainers and contributors. Torchvision: Pytorch’s computer vision library. https://github.com/pytorch/vision, 2016.
- Diffusion probabilistic models generalize when they fail to memorize. In ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, 2023.
- Reward-directed conditional diffusion: Provable distribution estimation and reward improvement. arXiv preprint arXiv:2307.07055, 2023.
- Visualizing and understanding convolutional networks. In Computer Vision – ECCV 2014, Lecture Notes in Computer Science, pp. 818–833, 2014.
- Antonio Sclocchi (15 papers)
- Alessandro Favero (13 papers)
- Matthieu Wyart (89 papers)