Latent Schr{ö}dinger Bridge Diffusion Model for Generative Learning (2404.13309v2)
Abstract: This paper aims to conduct a comprehensive theoretical analysis of current diffusion models. We introduce a novel generative learning methodology utilizing the Schr{\"o}dinger bridge diffusion model in latent space as the framework for theoretical exploration in this domain. Our approach commences with the pre-training of an encoder-decoder architecture using data originating from a distribution that may diverge from the target distribution, thus facilitating the accommodation of a large sample size through the utilization of pre-existing large-scale models. Subsequently, we develop a diffusion model within the latent space utilizing the Schr{\"o}dinger bridge framework. Our theoretical analysis encompasses the establishment of end-to-end error analysis for learning distributions via the latent Schr{\"o}dinger bridge diffusion model. Specifically, we control the second-order Wasserstein distance between the generated distribution and the target distribution. Furthermore, our obtained convergence rates effectively mitigate the curse of dimensionality, offering robust theoretical support for prevailing diffusion models.
- Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34:17981–17993, 2021.
- Linear convergence bounds for diffusion models via stochastic localization. arXiv preprint arXiv:2308.03686, 2023.
- The probability flow ode is provably fast. arXiv preprint arXiv:2305.11798, 2023.
- Restoration-degradation beyond linear diffusions: A non-asymptotic analysis for ddim-type samplers. In International Conference on Machine Learning, pages 4462–4484. PMLR, 2023.
- Deep conditional generative learning: Model and error analysis. arXiv preprint arXiv:2402.01460, 2024.
- Score diffusion models without early stopping: finite fisher information is all you need. arXiv preprint arXiv:2308.12240, 2023.
- Stochastic control liaisons: Richard sinkhorn meets gaspard monge on a schrodinger bridge. Siam Review, 63(2):249–313, 2021.
- Schrödinger bridges beat diffusion models on text-to-speech synthesis. arXiv preprint arXiv:2312.03491, 2023.
- Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. pages 4672–4712, 2023.
- Nonparametric regression on low-dimensional manifolds using deep relu networks: function approximation and statistical recovery. Information and Inference: A Journal of the IMA, 11(4):1203–1253, 03 2022.
- Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In International Conference on Machine Learning, pages 4735–4763. PMLR, 2023.
- Distribution approximation and statistical estimation guarantees of generative adversarial networks. arXiv preprint arXiv:2002.03938, 2020.
- Diffusion schrödinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34:17695–17709, 2021.
- Global optimization via schrödinger–föllmer diffusion. SIAM Journal on Control and Optimization, 61(5):2953–2980, 2023.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Paolo Dai Pra. A stochastic control approach to reciprocal diffusion processes. Applied mathematics and Optimization, 23:313–329, 1991.
- Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
- Convergence of continuous normalizing flows for learning probability distributions. arXiv preprint arXiv:2404.00551, 2024.
- Wasserstein convergence guarantees for a general class of score-based generative models. arXiv preprint arXiv:2311.11003, 2023.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
- Generative modeling for time series via schrödinger bridge. arXiv preprint arXiv:2304.05093, 2023.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Schrödinger-föllmer sampler: sampling without ergodicity. arXiv preprint arXiv:2106.10880, 2021.
- An error analysis of generative adversarial networks for learning distributions. The Journal of Machine Learning Research, 23(1):5047–5089, 2022.
- Error analysis of generative adversarial network. arXiv preprint arXiv:2310.15387, 2023.
- Cascaded diffusion models for high fidelity image generation. The Journal of Machine Learning Research, 23(1):2249–2281, 2022.
- Card: Classification and regression diffusion models. Advances in Neural Information Processing Systems, 35:18100–18115, 2022.
- Benton Jamison. The markov processes of schrödinger. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 32(4):323–331, 1975.
- Convergence analysis of schrödinger-föllmer sampler without convexity. arXiv preprint arXiv:2107.04766, 2021.
- Convergence analysis of flow matching in latent space with transformers. arXiv preprint arXiv:2404.02538, 2024.
- Deep nonparametric regression on approximate manifolds: Nonasymptotic error bounds with polynomial prefactors. The Annals of Statistics, 51(2):691–716, 2023.
- Approximation bounds for norm constrained neural networks with applications to regression and gans. Applied and Computational Harmonic Analysis, 65:249–278, 2023.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Christian Léonard. A survey of the schrödinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems-Series A, 34(4):1533–1574, 2014.
- Convergence for score-based generative modeling with polynomial complexity. Advances in Neural Information Processing Systems, 35:22870–22882, 2022.
- Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, pages 946–985. PMLR, 2023.
- Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021.
- Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343, 2022.
- I22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT sb: Image-to-image schrödinger bridge. arXiv preprint arXiv:2302.05872, 2023.
- Towards faster non-asymptotic convergence for diffusion-based generative models. arXiv preprint arXiv:2306.09251, 2023.
- Sora: A review on background, technology, limitations, and opportunities of large vision models. arXiv preprint arXiv:2402.17177, 2024.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
- Diffusion models are minimax optimal distribution estimators. pages 26517–26582, 2023.
- Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research, 22(1):2617–2680, 2021.
- Optimal approximation of piecewise smooth functions using deep relu neural networks. Neural Networks, 108:296–330, 2018.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
- Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538. PMLR, 2015.
- Erwin Schrödinger. Sur la théorie relativiste de l’électron et l’interprétation de la mécanique quantique. In Annales de l’institut Henri Poincaré, volume 2, pages 269–310, 1932.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2020.
- Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
- Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
- Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
- Empirical processes. In Weak Convergence and Empirical Processes: With Applications to Statistics, pages 127–384. Springer, 2023.
- Deep generative learning via schrödinger bridge. In International Conference on Machine Learning, pages 10794–10804. PMLR, 2021.
- Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.
- Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys, 56(4):1–39, 2023.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.