Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

From Pampas to Pixels: Fine-Tuning Diffusion Models for Gaúcho Heritage (2401.05520v1)

Published 10 Jan 2024 in cs.CV, cs.AI, and cs.CL

Abstract: Generative AI has become pervasive in society, witnessing significant advancements in various domains. Particularly in the realm of Text-to-Image (TTI) models, Latent Diffusion Models (LDMs), showcase remarkable capabilities in generating visual content based on textual prompts. This paper addresses the potential of LDMs in representing local cultural concepts, historical figures, and endangered species. In this study, we use the cultural heritage of Rio Grande do Sul (RS), Brazil, as an illustrative case. Our objective is to contribute to the broader understanding of how generative models can help to capture and preserve the cultural and historical identity of regions. The paper outlines the methodology, including subject selection, dataset creation, and the fine-tuning process. The results showcase the images generated, alongside the challenges and feasibility of each concept. In conclusion, this work shows the power of these models to represent and preserve unique aspects of diverse regions and communities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. (2022). Midjourney.
  2. Bernardes, A. D. (2021). O chimarrão como patrimônio imaterial gaúcho: os sentidos atribuídos ao desejo de preservação.
  3. Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf.
  4. Adversarial audio synthesis.
  5. Latent video diffusion models for high-fidelity video generation with arbitrary lengths. arXiv preprint arXiv:2211.13221.
  6. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851.
  7. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  8. Generative adversarial networks–enabled human–artificial intelligence collaborative applications for creative and design industries: A systematic review of current approaches and trends. Frontiers in artificial intelligence, 4:604234.
  9. Conventional and contemporary approaches used in text to speech synthesis: A review. Artificial Intelligence Review, 56(7):5837–5880.
  10. Arshadowgan: Shadow generative adversarial network for augmented reality in single light scenes. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8139–8148.
  11. Opal: Multimodal image generation for news illustration. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, pages 1–17.
  12. Performance comparison of tts models for brazilian portuguese to establish a baseline. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE.
  13. Fenologia reprodutiva e produção de sementes em araucaria angustifolia (bert.) o. kuntze. Brazilian Journal of Botany, 27:787–796.
  14. Oliveira, M. (2012). Garibaldi: herói dos dois mundos. Editora Contexto.
  15. OpenAI (2023). Gpt-4 technical report.
  16. Generation of anonymous chest radiographs using latent diffusion models for training thoracic abnormality classification systems. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE.
  17. PETBio/UFRGS e Laboratório de Herpetologia da UFRGS (Acesso em: 2023-11-01). SAPINHO-ADMIRÁVEL-DE-BARRIGA-VERMELHA.
  18. Brain imaging generation with latent diffusion models. In MICCAI Workshop on Deep Generative Models, pages 117–126. Springer.
  19. Ai art in architecture. AI in Civil Engineering, 2(1):8.
  20. Dreambooth3d: Subject-driven text-to-3d generation. arXiv preprint arXiv:2303.13508.
  21. Zero-shot text-to-image generation.
  22. Ai image-generation as a teaching strategy in nursing education. Journal of Interactive Learning Research, 34(2):369–399.
  23. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695.
  24. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510.
  25. Silveira, F. (2018). Gato-do-mato-pequeno (leopardus guttulus).
  26. A revolução farroupilha: o massacre de cerro dos porongos. Revista Contribuciones a las Ciencias Sociales, (27).
  27. Text to image latent diffusion model with dreambooth fine tuning for automobile image generation. In 2022 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), pages 440–445. IEEE.
  28. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14453–14463.
  29. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers.
  30. Cascaded latent diffusion models for high-resolution chest x-ray synthesis. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 180–191. Springer.
  31. Yıldırım, E. (2022). Text-to-image generation ai in architecture. Art and Architecture: Theory, Practice and Experience, page 97.
  32. História e memória da revolução farroupilha: breve genealogia do mito. Revista Brasileira de História, 31:49–70.
  33. Zhang, S. (2023). Dreambooth-based image generation methods for improving the performance of cnn. In 2023 IEEE 3rd International Conference on Electronic Technology, Communication and Information (ICETCI), pages 1181–1184. IEEE.
  34. Magicvideo: Efficient video generation with latent diffusion models. arXiv preprint arXiv:2211.11018.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.