Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models (2305.16225v3)

Published 25 May 2023 in cs.GR and cs.CV

Abstract: Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes such as material, style, and layout remains a challenge, leading to a lack of disentanglement and editability. To address this problem, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information, providing a new perspective on representing, generating, and editing images. We develop the Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called \sysname. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer better disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models. Our source code is available athttps://github.com/zyxElsa/ProSpect.

Overview of "ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models"

The paper, "ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models," presents a novel mechanism for advancing image generation through diffusion models by enhancing personalization capabilities with a refined focus on specific visual attributes. This research focuses on disentangling and editing complex visual features such as material, style, and layout within images, which are traditionally encapsulated within broad personalization methods.

Core Concept and Methodology

The core innovation offered by this research is the introduction of the Prompt Spectrum, a novel method that leverages the diffusion model's step-by-step image generation process. By generating images from low to high frequency information, the model provides a structured approach for disentangling visual attributes. Prompt Spectrum represents an image as a collection of textual token embeddings, each corresponding to specific stages of the generation process. This approach facilitates significant advances in attribute disentanglement, a notable challenge in existing models.

The authors construct a Prompt Spectrum Space (PP), expanding textual conditioning spaces to allow for enhanced representation, generation, and editing capacities without requiring additional model fine-tuning. This initiative allows for the separation of components such as content, material, and style at various defined stages of generation, enhancing both the flexibility and precision of attribute manipulation.

Experimental Insights and Results

The research includes a thorough experimental analysis exemplifying the relationship between the diffusion model's generation order and signal frequency, affirming that initial stages are conducive to layout structuring while subsequent stages refine content, and finally, high-frequency material and style attributes. The experimental results elucidate the Prompt Spectrum's superior disentanglement capabilities and demonstrate its potential for high-level control and attribute isolation.

Statistical evaluations further solidify the value of the Prompt Spectrum approach. For instance, CLIP-based evaluations highlight the Prompt Spectrum's edge in both text and image similarity metrics, suggesting that it achieves a favorable balance between fidelity to the reference and adaptability to new textual conditions. Moreover, results acquired from user studies consistently project higher participant preference for images generated with Prompt Spectrum compared to existing baselines.

Implications and Future Directions

Practical implications underscore the adaptability of the Prompt Spectrum across varied image generation tasks. Tailored applications such as material, style, and layout-aware image generation showcase its broad utility and capacity for generating high-fidelity images with robust contextual control. These capabilities significantly apply to personalized object generation, material and style transformations, and layout-based synthesis, underscoring the method’s potential for diverse implementations across artificial intelligence applications in graphics.

Looking forward, this methodology provides a promising avenue for further exploration in AI, particularly concerning more nuanced attribute isolation techniques and enhanced model interaction protocols. Future work might target granular division and re-combination methodologies for even more distinct attribute-based personalizations, integrating these processes with broader diffusion model frameworks for expansive real-world adaptability.

Conclusion

Overall, "ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models" presents a notable advancement in diffusion model capabilities, advocating for highly precise and adaptable image generation processes. Through the introduction and implementation of Prompt Spectrum, the research mitigates traditional challenges in visual attribute disentanglement, offering a robust framework for continued innovation within the field of generative AI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (127)
  1. 1984. SIGCOMM Comput. Commun. Rev. 13-14, 5-1 (1984).
  2. 2008. CHI ’08: CHI ’08 extended abstracts on Human factors in computing systems (Florence, Italy). ACM, New York, NY, USA. General Chair-Czerwinski, Mary and General Chair-Lund, Arnie and Program Chair-Tan, Desney.
  3. Rafal Ablamowicz and Bertfried Fauser. 2007. CLIFFORD: a Maple 11 Package for Clifford Algebra Computations, version 11. Retrieved February 28, 2008 from http://math.tntech.edu/rafal/cliff11/index.html
  4. Patricia S. Abril and Robert Plant. 2007. The patent holder’s dilemma: Buy, sell, or troll? Commun. ACM 50, 1 (2007), 36–44. https://doi.org/10.1145/1188913.1188915
  5. Sten Andler. 1979. Predicate Path expressions. In Proceedings of the 6th. ACM SIGACT-SIGPLAN symposium on Principles of Programming Languages (POPL ’79). ACM Press, New York, NY, 226–236. https://doi.org/10.1145/567752.567774
  6. David A. Anisi. 2003. Optimal Motion Control of a Ground Vehicle. Master’s thesis. Royal Institute of Technology (KTH), Stockholm, Sweden.
  7. Art Institute of Chicago. 2023. https://www.artic.edu/ Last accessed on 2023-09-12.
  8. Blended Diffusion for Text-Driven Editing of Natural Images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18208–18218.
  9. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers. arXiv preprint arXiv:2211.01324 (2022).
  10. Paint by word. arXiv preprint arXiv:2103.10951 (2021).
  11. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations (ICLR).
  12. InstructPix2Pix: Learning to Follow Image Editing Instructions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18392–18402.
  13. Vertex Types in Book-Embeddings. Technical Report. Amherst, MA, USA.
  14. Muse: Text-To-Image Generation via Masked Generative Transformers. In International Conference on Machine Learning (ICML).
  15. Min Jin Chong and David Forsyth. 2022. JoJoGAN: One Shot Face Stylization. In European Conference on Computer Vision (ECCV) (Tel Aviv, Israel). Springer-Verlag, Berlin, Heidelberg, 128–152.
  16. Kenneth L. Clarkson. 1985a. Algorithms for Closest-Point Problems (Computational Geometry). Ph. D. Dissertation. Stanford University, Palo Alto, CA. UMI Order Number: AAT 8506171.
  17. Kenneth Lee Clarkson. 1985b. Algorithms for Closest-Point Problems (Computational Geometry). Ph. D. Dissertation. Stanford University, Stanford, CA, USA. Advisor(s) Yao, Andrew C. AAT 8506171.
  18. Jacques Cohen (Ed.). 1996. Special issue: Digital Libraries. Commun. ACM 39, 11 (1996).
  19. Deciding equivalances among conjunctive aggregate queries. J. ACM 54, 2, Article 5 (2007), 50 pages. https://doi.org/10.1145/1219092.1219093
  20. (new) Distributed data source verification in wireless sensor networks. Inf. Fusion 10, 4 (2009), 342–353. https://doi.org/10.1016/j.inffus.2009.01.002
  21. (old) Distributed data source verification in wireless sensor networks. Inf. Fusion 10, 4 (2009), 342–353. https://doi.org/10.1016/j.inffus.2009.01.002
  22. VQGAN-CLIP: Open domain image generation and editing with natural language guidance. In European Conference on Computer Vision (ECCV). Springer, 88–105.
  23. StyTr22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Image Style Transfer with Transformers. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11326–11336.
  24. Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems (NeurIPS. 8780–8794.
  25. Statecarts in use: structured analysis and object-orientation. In Lectures on Embedded Systems, Grzegorz Rozenberg and Frits W. Vaandrager (Eds.). Lecture Notes in Computer Science, Vol. 1494. Springer-Verlag, London, 368–394. https://doi.org/10.1007/3-540-65193-4_29
  26. Ian Editor (Ed.). 2007. The title of book one (1st. ed.). The name of the series one, Vol. 9. University of Chicago Press, Chicago. https://doi.org/10.1007/3-540-09237-4
  27. Ian Editor (Ed.). 2008. The title of book two (2nd. ed.). University of Chicago Press, Chicago, Chapter 100. https://doi.org/10.1007/3-540-09237-4
  28. Taming Transformers for High-Resolution Image Synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12873–12883.
  29. Make-a-scene: Scene-based text-to-image generation with human priors. In European Conference on Computer Vision (ECCV). Springer, 89–106.
  30. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. In International Conference on Learning Representations (ICLR).
  31. Encoder-based domain tuning for fast personalization of text-to-image models. ACM Transactions on Graphics (TOG) 42, 4 (2023), 1–13.
  32. StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators. ACM Transactions on Graphics 41, 4, Article 141 (2022), 13 pages.
  33. Dan Geiger and Christopher Meek. 2005. Structured Variational Inference Procedures and their Realizations (as incol). In Proceedings of Tenth International Workshop on Artificial Intelligence and Statistics, The Barbados. The Society for Artificial Intelligence and Statistics.
  34. Generative Adversarial Nets. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc.
  35. The Latex Web Companion: Integrating TEX, HTML, and XML (1st ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
  36. Catch me, if you can: Evading network signatures with web-based polymorphic worms. In Proceedings of the first USENIX workshop on Offensive Technologies (WOOT ’07). USENIX Association, Berkley, CA, Article 7, 9 pages.
  37. Catch me, if you can: Evading network signatures with web-based polymorphic worms. In Proceedings of the first USENIX workshop on Offensive Technologies (WOOT ’08). USENIX Association, Berkley, CA, Article 7, 2 pages.
  38. Catch me, if you can: Evading network signatures with web-based polymorphic worms. In Proceedings of the first USENIX workshop on Offensive Technologies (WOOT ’09). USENIX Association, Berkley, CA, 90–100.
  39. David Harel. 1978. LOGICS of Programs: AXIOMATICS and DESCRIPTIVE POWER. MIT Research Lab Technical Report TR-200. Massachusetts Institute of Technology, Cambridge, MA.
  40. David Harel. 1979. First-Order Dynamic Logic. Lecture Notes in Computer Science, Vol. 68. Springer-Verlag, New York, NY. https://doi.org/10.1007/3-540-09237-4
  41. Prompt-to-Prompt Image Editing with Cross Attention Control. In International Conference on Learning Representations (ICLR).
  42. Image analogies. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 327–340.
  43. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems (NIPS).
  44. Billy S. Hollis. 1999. Visual Basic 6: Design, Specification, and Objects with Other (1st ed.). Prentice Hall PTR, Upper Saddle River, NJ, USA.
  45. Composer: Creative and Controllable Image Synthesis with Composable Conditions. In International Conference on Machine Learning (ICML).
  46. Composer: Creative and Controllable Image Synthesis with Composable Conditions. (2023).
  47. Region-Aware Diffusion for Zero-shot Text-driven Image Editing. arXiv preprint arXiv:2302.11797 (2023).
  48. Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion. In ACM International Conference on Multimedia (Lisboa, Portugal). 1085–1094.
  49. Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer. arXiv preprint arXiv:2305.05464 (2023).
  50. DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization. arXiv preprint arXiv:2211.10682 (2022).
  51. Multimodal Unsupervised Image-to-Image Translation. In European Conference on Computer Vision (ECCV). 172–189.
  52. ReVersion: Diffusion-Based Relation Inversion from Images. arXiv preprint arXiv:2303.13495 (2023).
  53. IEEE 2004. IEEE TCSC Executive Committee. In Proceedings of the IEEE International Conference on Web Services (ICWS ’04). IEEE Computer Society, Washington, DC, USA, 21–22. https://doi.org/10.1109/ICWS.2004.64
  54. Training-free Style Transfer Emerges from h-space in Diffusion models. arXiv preprint arXiv:2303.15403 (2023).
  55. Training Generative Adversarial Networks with Limited Data. In Advances in Neural Information Processing Systems (NeurIPS). 12104–12114.
  56. A Style-Based Generator Architecture for Generative Adversarial Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4401–4410.
  57. Imagic: Text-Based Real Image Editing with Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6007–6017.
  58. Donald E. Knuth. 1981. Seminumerical Algorithms. Addison-Wesley.
  59. Donald E. Knuth. 1997. The Art of Computer Programming, Vol. 1: Fundamental Algorithms (3rd. ed.). Addison Wesley Longman Publishing Co., Inc.
  60. Donald E. Knuth. 1998. The Art of Computer Programming (3rd ed.). Fundamental Algorithms, Vol. 1. Addison Wesley Longman Publishing Co., Inc. (book).
  61. Wei-Chang Kong. 2001a. E-commerce and cultural values. IGI Publishing, Hershey, PA, USA, Name of chapter: The implementation of electronic commerce in SMEs in Singapore (Inbook-w-chap-w-type), 51–74. http://portal.acm.org/citation.cfm?id=887006.887010
  62. Wei-Chang Kong. 2001b. The implementation of electronic commerce in SMEs in Singapore (as Incoll). In E-commerce and cultural values. IGI Publishing, Hershey, PA, USA, 51–74. http://portal.acm.org/citation.cfm?id=887006.887010
  63. Wei-Chang Kong. 2002. Chapter 9. In E-commerce and cultural values (Incoll-w-text (chap 9) ’title’), Theerasak Thanasankit (Ed.). IGI Publishing, Hershey, PA, USA, 51–74. http://portal.acm.org/citation.cfm?id=887006.887010
  64. Wei-Chang Kong. 2003. The implementation of electronic commerce in SMEs in Singapore (Incoll). In E-commerce and cultural values, Theerasak Thanasankit (Ed.). IGI Publishing, Hershey, PA, USA, 51–74. http://portal.acm.org/citation.cfm?id=887006.887010
  65. Wei-Chang Kong. 2004. E-commerce and cultural values - (InBook-num-in-chap). IGI Publishing, Hershey, PA, USA, Chapter 9, 51–74. http://portal.acm.org/citation.cfm?id=887006.887010
  66. Wei-Chang Kong. 2005. E-commerce and cultural values (Inbook-text-in-chap). IGI Publishing, Hershey, PA, USA, Chapter: The implementation of electronic commerce in SMEs in Singapore, 51–74. http://portal.acm.org/citation.cfm?id=887006.887010
  67. Wei-Chang Kong. 2006. E-commerce and cultural values (Inbook-num chap). IGI Publishing, Hershey, PA, USA, Chapter (in type field) 22, 51–74. http://portal.acm.org/citation.cfm?id=887006.887010
  68. David Kosiur. 2001. Understanding Policy-Based Networking (2nd. ed.). Wiley, New York, NY.
  69. Multi-Concept Customization of Text-to-Image Diffusion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1931–1941.
  70. Multi-Concept Customization of Text-to-Image Diffusion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  71. Gihyun Kwon and Jong Chul Ye. 2022. CLIPstyler: Image Style Transfer with a Single Text Condition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18062–18071.
  72. Drit++: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision 128 (2020), 2402–2417.
  73. Newton Lee. 2005. Interview with Bill Kinder: January 13, 2005. Video. Comput. Entertain. 3, 1, Article 4 (2005). https://doi.org/10.1145/1057270.1057278
  74. Portalis: using competitive online interactions to support aid initiatives for the homeless. In CHI ’08 extended abstracts on Human factors in computing systems (Florence, Italy). ACM, New York, NY, USA, 3873–3878. https://doi.org/10.1145/1358628.1358946
  75. StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing. arXiv preprint arXiv:2303.15649 (2023).
  76. Text to Image Generation with Semantic-Spatial Aware GAN. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18187–18196.
  77. RePaint: Inpainting Using Denoising Diffusion Probabilistic Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11461–11471.
  78. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021).
  79. Locating and editing factual associations in GPT. In Advances in Neural Information Processing Systems (NeurIPS). 17359–17372.
  80. Null-text Inversion for Editing Real Images using Guided Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6038–6047.
  81. Sape Mullender (Ed.). 1993. Distributed systems (2nd Ed.). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA.
  82. National Gallery of Art. 2023. https://www.nga.gov/ Last accessed on 2023-09-12.
  83. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning (ICML).
  84. Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning (ICML). 8162–8171.
  85. Dave Novak. 2003. Solder man. Video. In ACM SIGGRAPH 2003 Video Review on Animation theater Program: Part I - Vol. 145 (July 27–27, 2003). ACM Press, New York, NY, 4. https://doi.org/99.9999/woot07-S422
  86. Barack Obama. 2008. A more perfect union. Video. Retrieved March 21, 2008 from http://video.google.com/videoplay?docid=6528042696351994555
  87. Swapping autoencoder for deep image manipulation. Advances in Neural Information Processing Systems 33 (2020), 7198–7211.
  88. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. In IEEE/CVF International Conference on Computer Vision (ICCV). 2085–2094.
  89. Charles J. Petrie. 1986a. New Algorithms for Dependency-Directed Backtracking (Master’s thesis). Technical Report. Austin, TX, USA.
  90. Charles J. Petrie. 1986b. New Algorithms for Dependency-Directed Backtracking (Master’s thesis). Master’s thesis. University of Texas at Austin, Austin, TX, USA.
  91. Pexels. 2023. https://www.pexels.com Last accessed on 2023-09-12.
  92. Poker-Edge.Com. 2006. Stats and Analysis. Retrieved June 7, 2006 from http://www.poker-edge.com/stats.php
  93. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML). 8748–8763.
  94. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125 (2022).
  95. Zero-shot text-to-image generation. In International Conference on Machine Learning (ICML). PMLR, 8821–8831.
  96. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695.
  97. Bernard Rous. 2008. The Enabling of Digital Libraries. Digital Libraries 12, 3, Article 5 (2008). To appear.
  98. DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22500–22510.
  99. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems (NeurIPS). 36479–36494.
  100. StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation. In International Joint Conference on Artificial Intelligence (IJCAI). 4966–4972.
  101. Joseph Scientist. 2009. The fountain of youth. Patent No. 12345, Filed July 1st., 2008, Issued Aug. 9th., 2009.
  102. FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6490–6499.
  103. Stan W. Smith. 2010. An experiment in bibliographic mark-up: Parsing metadata for XML export. In Proceedings of the 3rd. annual workshop on Librarians and Computers (LAC ’10, Vol. 3), Reginald N. Smythe and Alexander Noble (Eds.). Paparazzi Press, Milan Italy, 422–431. https://doi.org/99.9999/woot07-S422
  104. Asad Z. Spector. 1990. Achieving application requirements. In Distributed Systems (2nd. ed.), Sape Mullender (Ed.). ACM Press, New York, NY, 19–33. https://doi.org/10.1145/90417.90738
  105. DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16494–16504.
  106. Key-Locked Rank One Editing for Text-to-Image Personalization. In ACM SIGGRAPH 2023 Conference Proceedings (Los Angeles, CA, USA) (SIGGRAPH ’23). Association for Computing Machinery, New York, NY, USA, Article 12, 11 pages.
  107. The Barnes Foundation. 2023. https://www.barnesfoundation.org/ Last accessed on 2023-09-12.
  108. Harry Thornburg. 2001. Introduction to Bayesian Statistics. Retrieved March 2, 2005 from http://ccrma.stanford.edu/~jos/bayes/bayes.html
  109. UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image. ACM Transactions on Graphics 42, 4, Article 128 (2023), 10 pages.
  110. P+limit-from𝑃P+italic_P +: Extended Textual Conditioning in Text-to-Image Generation. arXiv preprint arXiv:2303.09522 (2023).
  111. Towards harmonized regional style transfer and manipulation for facial images. Computational Visual Media 9, 2 (2023), 351–366.
  112. Discriminative feature encoding for intrinsic image decomposition. Computational Visual Media 9, 3 (2023), 597–618.
  113. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. arXiv preprint arXiv:2302.03668 (2023).
  114. (new) Finding minimum congestion spanning trees. J. Exp. Algorithmics 5, Article 11 (2000). https://doi.org/10.1145/351827.384253
  115. (old) Finding minimum congestion spanning trees. J. Exp. Algorithmics 5 (2000), 11. https://doi.org/10.1145/351827.384253
  116. Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1900–1910.
  117. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1316–1324.
  118. Paint by Example: Exemplar-based Image Editing with Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18381–18391.
  119. Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer. arXiv preprint arXiv:2303.08622 (2023).
  120. Improving text-to-image synthesis using contrastive learning. arXiv preprint arXiv:2107.02423 (2021).
  121. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. Transactions on Machine Learning Research (2023).
  122. Cross-Modal Contrastive Learning for Text-to-Image Generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 833–842.
  123. Inversion-Based Style Transfer with Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10146–10156.
  124. Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning. In ACM SIGGRAPH 2022 Conference Proceedings. Article 12, 8 pages.
  125. A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning. ACM Transactions on Graphics 42, 5, Article 169 (2023), 16 pages.
  126. SINE: SINgle Image Editing with Text-to-Image Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6027–6037.
  127. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5802–5810.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yuxin Zhang (91 papers)
  2. Weiming Dong (50 papers)
  3. Fan Tang (46 papers)
  4. Nisha Huang (10 papers)
  5. Haibin Huang (60 papers)
  6. Chongyang Ma (52 papers)
  7. Tong-Yee Lee (21 papers)
  8. Oliver Deussen (34 papers)
  9. Changsheng Xu (100 papers)
Citations (56)