Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis (2405.08210v1)

Published 13 May 2024 in cs.CV

Abstract: We present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt. Our approach fine-tunes a diffusion model on a single texture, and learns to embed that statistical distribution in the output domain of the model. We seed this fine-tuning process with a sample texture patch, which can be optionally generated from a text-to-image model like DALL-E 2. At generation time, our fine-tuned diffusion model is used through a score aggregation strategy to generate output texture images of arbitrary resolution on a single GPU. We compare synthesized textures from our method to existing work in patch-based and deep learning texture synthesis methods. We also showcase two applications of our generated textures in 3D rendering and texture transfer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Multidiffusion: Fusing diffusion paths for controlled image generation. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, pages 1737–1752. PMLR, 2023.
  2. Learning texture manifolds with the periodic spatial gan. arXiv preprint arXiv:1705.06566, 2017.
  3. Image melding: Combining inconsistent images using patch-based synthesis. ACM Transactions on graphics (TOG), 31(4):1–10, 2012.
  4. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  5. Image quilting for texture synthesis and transfer. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 341–346, 2001.
  6. Texture synthesis by non-parametric sampling. In Proceedings of the seventh IEEE international conference on computer vision, pages 1033–1038. IEEE, 1999.
  7. Texture synthesis using convolutional neural networks. Advances in neural information processing systems, 28, 2015a.
  8. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015b.
  9. Image analogies. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 327–340, 2001.
  10. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  11. Texture synthesis with spatial generative adversarial networks. arXiv preprint arXiv:1611.08207, 2016.
  12. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711. Springer, 2016.
  13. Self tuning texture optimization. In Computer Graphics Forum, pages 349–359. Wiley Online Library, 2015.
  14. Graphcut textures: Image and video synthesis using graph cuts. Acm transactions on graphics (tog), 22(3):277–286, 2003.
  15. Diversified texture synthesis with feed-forward networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3920–3928, 2017.
  16. Real-time texture synthesis by patch-based sampling. ACM Transactions on Graphics (ToG), 20(3):127–150, 2001.
  17. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  18. Quantitative evaluation of near regular texture synthesis algorithms. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), pages 427–434. IEEE, 2006.
  19. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927, 2022.
  20. Generating images from captions with attention. arXiv preprint arXiv:1511.02793, 2015.
  21. On distillation of guided diffusion models. arXiv preprint arXiv:2210.03142, 2022.
  22. μ𝜇\muitalic_μnca: Texture generation with ultra-compact neural cellular automata. arXiv preprint arXiv:2111.13545, 2021.
  23. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  24. Self-organising textures. Distill, 6(2):e00027–003, 2021.
  25. Lapped textures. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 465–470, 2000.
  26. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  27. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  28. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  29. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  30. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 2022.
  31. Generative adversarial text to image synthesis. In International conference on machine learning, pages 1060–1069. PMLR, 2016.
  32. High-resolution image synthesis with latent diffusion models, 2021.
  33. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  34. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  35. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
  36. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  37. Df-gan: A simple and effective baseline for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16515–16525, 2022.
  38. Greg Turk. Generating textures on arbitrary surfaces using reaction-diffusion. Acm Siggraph Computer Graphics, 25(4):289–298, 1991.
  39. Greg Turk. Texture synthesis on surfaces. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 347–354, 2001.
  40. Texture networks: Feed-forward synthesis of textures and stylized images. arXiv preprint arXiv:1603.03417, 2016.
  41. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6924–6932, 2017.
  42. Toward a universal model for shape from texture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  43. Unique geometry and texture from corresponding image patches. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12):4519–4522, 2021.
  44. Pretraining is all you need for image-to-image translation. arXiv preprint arXiv:2205.12952, 2022.
  45. Fast texture synthesis using tree-structured vector quantization. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 479–488, 2000.
  46. Caltech-UCSD Birds 200. 2011.
  47. Reaction-diffusion textures. In Proceedings of the 18th annual conference on computer graphics and interactive techniques, pages 299–308, 1991.
  48. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1316–1324, 2018.
  49. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 5907–5915, 2017.
  50. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  51. Non-stationary texture synthesis by adversarial expansion. arXiv preprint arXiv:1805.04487, 2018.
  52. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
  53. Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5802–5810, 2019.
Citations (1)

Summary

  • The paper demonstrates a novel method where a reference texture is fine-tuned on a diffusion model to synthesize high-resolution textures.
  • It employs a multi-stage process with score aggregation to overcome memory constraints and generate textures up to 85 megapixels.
  • The approach simplifies 3D rendering workflows by efficiently producing diverse, realistic textures from text prompts on modern GPUs.

Infinite Texture: Generating High-Resolution Textures with Diffusion Models

Introduction

Generating realistic textures is a crucial aspect of computer graphics. Textures help in simulating detailed surfaces like wood, fabric, and skin, which can enhance the visual quality of computer-generated imagery. While traditional methods often require significant manual effort, modern approaches have started leveraging machine learning to tackle this challenge. The paper we're discussing introduces Infinite Texture, a method capable of generating arbitrarily large, high-quality textures from text prompts using advanced diffusion models.

Key Methodology

Infinite Texture stands out by employing a multi-stage process for texture generation:

  1. Generate Reference Texture from Text: Initially, a reference texture image is generated from a text prompt using a pre-trained text-to-image diffusion model (like DALL-E 2).
  2. Fine-Tune Diffusion Model: This reference texture is then used to fine-tune an existing diffusion model, ensuring it learns the statistical properties of the texture.
  3. Synthesizing High-Resolution Textures: Finally, the fine-tuned diffusion model is employed to produce high-quality, high-resolution textures, circumventing typical memory constraints.

Diffusion Models in Texture Synthesis

Diffusion models are powerful probabilistic generative models. They work by first adding noise to the data and then learning to reverse this process to generate data from noise. Here's the key process breakdown:

  • Training: The model learns to denoise images through a series of iterations, training on patches of the reference texture.
  • Inference: At the generation stage, a technique called score aggregation is used to piece together large texture images from smaller patches, ensuring the resultant large images are coherent and high quality.

Strong Numerical Results and Claims

The authors present strong performance metrics for Infinite Texture:

  • Resolution: The model can generate textures at resolutions significantly higher than existing methods, up to 85 megapixels.
  • Efficiency: Unlike some earlier methods that are slow and manually intensive, Infinite Texture operates efficiently on modern GPUs.

Practical Implications

One major practical application showcased in the paper is 3D rendering. High-quality textures are essential for creating realistic 3D models used in games, movies, and virtual reality. Infinite Texture’s ability to generate a diverse array of textures from simple text prompts can vastly simplify content creation workflows in these fields.

Theoretical Implications

Theoretically, this work extends the capabilities of diffusion models beyond traditional image generation. It demonstrates that fine-tuning diffusion models on specific data distributions (like textures) can enable them to perform specialized tasks exceptionally well. This could open up new avenues for research in tailoring generative models to other complex data types.

Future Developments

Given the impressive results of Infinite Texture, it's reasonable to anticipate several future developments:

  • Broader Diversity: Fine-tuning on an even more diverse set of textures could further enhance the model's flexibility and output quality.
  • Real-Time Applications: While the method is efficient, improvements could make real-time texture generation possible, benefiting interactive applications like video games.
  • Integration with Other AI Systems: Combining Infinite Texture with other AI systems like interactive design tools could revolutionize content creation workflows.

Conclusion

Infinite Texture exemplifies how modern AI techniques, specifically diffusion models, can solve practical problems in computer graphics. By enabling easy generation of high-quality, high-resolution textures from text prompts, this method provides a powerful tool for both artists and technical developers. As AI continues to advance, we can expect even more innovative solutions to arise in the field of texture synthesis and beyond.

X Twitter Logo Streamline Icon: https://streamlinehq.com