aMUSEd: An Open MUSE Reproduction
Abstract: We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.
- Cm3: A causal masked multimodal model of the internet. arXiv preprint arXiv:2201.07520, 2022.
- Improving image generation with better captions. 2023.
- Maskgit: Masked generative image transformer, 2022.
- Muse: Text-to-image generation via masked generative transformers, 2023.
- Pixart-α𝛼\alphaitalic_α: Fast training of diffusion transformer for photorealistic text-to-image synthesis. arXiv preprint arXiv:2310.00426, 2023.
- DeepFloyd. Stability ai releases deepfloyd if, a powerful text-to-image model that can smartly integrate text into images. https://stability.ai/news/deepfloyd-if-text-to-image-model, 2023.
- Llm.int8(): 8-bit matrix multiplication for transformers at scale, 2022.
- Qlora: Efficient finetuning of quantized llms, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Genie: Higher-order denoising diffusion solvers, 2022.
- Taming transformers for high-resolution image synthesis, 2021.
- Hierarchical neural story generation, 2018.
- Datacomp: In search of the next generation of multimodal datasets, 2023.
- On calibration of modern neural networks, 2017.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- The curious case of neural text degeneration, 2020.
- Simple diffusion: End-to-end diffusion for high resolution images, 2023.
- Lora: Low-rank adaptation of large language models, 2021.
- Language is not all you need: Aligning perception with language models. arXiv preprint arXiv:2302.14045, 2023.
- How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977, 2021. doi: 10.1162/tacl_a_00407. URL https://aclanthology.org/2021.tacl-1.57.
- Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
- Text2video-zero: Text-to-image diffusion models are zero-shot video generators, 2023.
- An introduction to variational autoencoders. CoRR, abs/1906.02691, 2019. URL http://arxiv.org/abs/1906.02691.
- Obelics: An open web-scale filtered dataset of interleaved image-text documents, 2023.
- Microsoft coco: Common objects in context, 2015.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927, 2022a.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022b.
- Latent consistency models: Synthesizing high-resolution images with few-step inference, 2023a.
- Lcm-lora: A universal stable-diffusion acceleration module, 2023b.
- Scalable diffusion models with transformers, 2023.
- Film: Visual reasoning with a general conditioning layer, 2017.
- Sdxl: improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
- Dreamfusion: Text-to-3d using 2d diffusion, 2022.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer, 2023.
- Zero-shot text-to-image generation, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation, 2015.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation, 2023.
- RunwayML. Stable diffusion inpainting. https://huggingface.co/runwayml/stable-diffusion-inpainting, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding, 2022.
- Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
- Improved techniques for training gans, 2016.
- Adversarial diffusion distillation, 2023.
- Christoph Schuhmann. Laion-aesthetics. https://laion.ai/blog/laion-aesthetics/, 2022.
- Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022a.
- Laion coco: 600m synthetic captions from laion2b-en. https://laion.ai/blog/laion-coco/, 2022b.
- Styledrop: Text-to-image generation in any style, 2023.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Score-based generative modeling through stochastic differential equations, 2021.
- Journeydb: A benchmark for generative image understanding, 2023.
- Attention is all you need, 2023.
- Diffusers: State-of-the-art diffusion models. URL https://github.com/huggingface/diffusers.
- On the de-duplication of laion-2b, 2023.
- Scaling autoregressive models for content-rich text-to-image generation, 2022.
- Scaling autoregressive multi-modal models: Pretraining and instruction tuning, 2023.
- Fast sampling of diffusion models with exponential integrator, 2023.
- Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. arXiv preprint arXiv:2302.04867, 2023.
- Dpm-solver-v3: Improved diffusion ode solver with empirical model statistics, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.