SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers (2401.08740v2)
Abstract: We present Scalable Interpolant Transformers (SiT), a family of generative models built on the backbone of Diffusion Transformers (DiT). The interpolant framework, which allows for connecting two distributions in a more flexible way than standard diffusion models, makes possible a modular study of various design choices impacting generative models built on dynamical transport: learning in discrete or continuous time, the objective function, the interpolant that connects the distributions, and deterministic or stochastic sampling. By carefully introducing the above ingredients, SiT surpasses DiT uniformly across model sizes on the conditional ImageNet 256x256 and 512x512 benchmark using the exact same model structure, number of parameters, and GFLOPs. By exploring various diffusion coefficients, which can be tuned separately from learning, SiT achieves an FID-50K score of 2.06 and 2.62, respectively.
- Building normalizing flows with stochastic interpolants. In ICLR, 2023.
- Multimarginal generative modeling with stochastic interpolants. arXiv preprint arXiv:2310.03695, 2023a.
- Stochastic interpolants: A unifying framework for flows and diffusions. arXiv preprint arXiv:2303.08797, 2023b.
- Stochastic interpolants with data-dependent couplings. arXiv preprint arXiv:2310.03725, 2023c.
- Brian D.O. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 1982.
- Matching normalizing flows and probability paths on manifolds. In ICML, 2022.
- Error bounds for flow matching methods. arXiv preprint arXiv:2305.16860, 2023.
- Align your latents: High-resolution video synthesis with latent diffusion models. In CVPR, 2023.
- Deep learning probability flows and entropy production rates in active matter. arXiv preprint arXiv:2309.12991, 2023.
- Large scale gan training for high fidelity natural image synthesis. In ICLR, 2019.
- Transformer-based deep learning for predicting protein properties in the life sciences. Elife, 12:e82819, 2023.
- Maskgit: Masked generative image transformer. In CVPR, 2022.
- Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In ICML, 2023a.
- Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. In ICLR, 2023b.
- Restoration-degradation beyond linear diffusions: A non-asymptotic analysis for DDIM-type samplers. In ICML, 2023c.
- Ting Chen. On the importance of noise scheduling for diffusion models. arXiv preprint arXiv:2301.10972, 2023.
- Flow matching in latent space. arXiv preprint arXiv:2307.08698, 2023.
- Diffusion schrödinger bridge with applications to score-based generative modeling. In NeurIPS, 2021.
- Diffusion models beat gans on image synthesis. In NeurIPS, 2021.
- Score-based generative modeling with critically-damped langevin diffusion. In ICLR, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Masked diffusion transformer is a strong image synthesizer. arXiv preprint arXiv:2303.14389, 2023.
- Photorealistic video generation with diffusion models. arXiv preprint arXiv:2312.06662, 2023.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Cascaded diffusion models for high fidelity image generation. arXiv preprint arXiv:2106.15282, 2021.
- simple diffusion: End-to-end diffusion for high resolution images. In ICML, 2023.
- Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. JMLR, 2005.
- Aapo Hyvärinen. Sparse code shrinkage: Denoising of nongaussian data by maximum likelihood estimation. Neural Computation, 1999.
- Scalable adaptive computation for iterative generation. In ICML, 2023.
- Farm3D: Learning articulated 3d animals by distilling 2d diffusion. In 3DV, 2024.
- Elucidating the design space of diffusion-based generative models. In NeurIPS, 2022.
- Patrick Kidger. On Neural Differential Equations. PhD thesis, University of Oxford, 2021.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Understanding the diffusion objective as a weighted integral of elbos. arXiv preprint arXiv:2303.00848, 2023.
- Variational diffusion models. In NeurIPS, 2021.
- Convergence for score-based generative modeling with polynomial complexity. In NeurIPS, 2022.
- Convergence of score-based generative modeling for general data distributions. In ALT, 2023.
- Flow matching for generative modeling. In ICLR, 2023.
- Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023a.
- Let us build bridges: Understanding and extending diffusion generative models. arXiv preprint arXiv:2208.14699, 2022.
- Flow straight and fast: Learning to generate and transfer data with rectified flow. In ICLR, 2023b.
- Decoupled weight decay regularization. In ICLR, 2019.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In NeurIPS, 2022.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.
- Improved denoising diffusion probabilistic models. In ICML, 2021.
- Image Transformer. In ICML, 2018.
- Scalable diffusion models with transformers. In ICCV, 2023.
- Stefano Peluchetti. Non-denoising forward-time diffusions. In ICLR, 2022.
- Multisample flow matching: Straightening flows with minibatch couplings. In ICML, 2023.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
- Imagenet large scale visual recognition challenge. IJCV, 2015.
- Progressive distillation for fast sampling of diffusion models. In ICLR, 2022.
- Stylegan-xl: Scaling stylegan to large diverse datasets. In SIGGRAPH, 2022.
- Diffusion schrödinger bridge matching. In NeurIPS, 2023.
- Noise removal via bayesian wavelet coring. In ICIP, 1996.
- Where to diffuse, how to diffuse, and how to get back: Automated learning for multivariate diffusions. In ICLR, 2023.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
- Denoising diffusion implicit models. In ICLR, 2021a.
- Maximum likelihood training of score-based diffusion models. In NeurIPS, 2021b.
- Score-based generative modeling through stochastic differential equations. In ICLR, 2021c.
- Consistency models. In ICML, 2023.
- Improving and generalizing flow-based generative models with minibatch optimal transport. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023.
- Score-based generative modeling in latent space. In NeurIPS, 2021.
- Attention is all you need. In NeurIPS, 2017.
- A Self-Attention Ansatz for Ab-initio Quantum Chemistry. In ICLR, 2023.
- Learning deep transformer models for machine translation. In ACL, 2019.
- Big Bird: Transformers for Longer Sequences. In NeurIPS, 2020.
- Fast training of diffusion models with masked transformers. arXiv preprint arXiv:2306.09305, 2023a.
- Improved techniques for maximum likelihood estimation for diffusion odes. In ICML, 2023b.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.