Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Training-and-Prompt-Free General Painterly Harmonization via Zero-Shot Disentenglement on Style and Content References (2404.12900v2)

Published 19 Apr 2024 in cs.CV, cs.AI, and cs.MM

Abstract: Painterly image harmonization aims at seamlessly blending disparate visual elements within a single image. However, previous approaches often struggle due to limitations in training data or reliance on additional prompts, leading to inharmonious and content-disrupted output. To surmount these hurdles, we design a Training-and-prompt-Free General Painterly Harmonization method (TF-GPH). TF-GPH incorporates a novel Similarity Disentangle Mask'', which disentangles the foreground content and background image by redirecting their attention to corresponding reference images, enhancing the attention mechanism for multi-image inputs. Additionally, we propose aSimilarity Reweighting'' mechanism to balance harmonization between stylization and content preservation. This mechanism minimizes content disruption by prioritizing the content-similar features within the given background style reference. Finally, we address the deficiencies in existing benchmarks by proposing novel range-based evaluation metrics and a new benchmark to better reflect real-world applications. Extensive experiments demonstrate the efficacy of our method in all benchmarks. More detailed in https://github.com/BlueDyee/TF-GPH.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Blended latent diffusion. ACM Transactions on Graphics (TOG) 42, 4 (2023), 1–11.
  2. Compositional gan: Learning image-conditional binary composition. International Journal of Computer Vision 128 (2020), 2570–2585.
  3. Align your latents: High-resolution video synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22563–22575.
  4. Diffdreamer: Towards consistent unsupervised single-view scene extrapolation with conditional diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2139–2150.
  5. Painterly image harmonization in dual domains. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 268–276.
  6. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22560–22570.
  7. Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models. ACM Transactions on Graphics (TOG) 42, 4 (2023), 1–10.
  8. Hierarchical dynamic image harmonization. In Proceedings of the 31st ACM International Conference on Multimedia. 1422–1430.
  9. General image-to-image translation with one-shot image guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22736–22746.
  10. Dovenet: Deep image harmonization via domain verification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8394–8403.
  11. DiffEdit: Diffusion-based Semantic Image Editing with Mask Guidance. In International Conference on Learning Representations.
  12. Stytr2: Image style transfer with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11326–11336.
  13. Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021), 8780–8794.
  14. Diffusion self-guidance for controllable image generation. Advances in Neural Information Processing Systems 36 (2023), 16222–16239.
  15. Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–13.
  16. Controlling perceptual factors in neural style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3985–3993.
  17. Photoswap: Personalized subject swapping in images. Advances in Neural Information Processing Systems 36 (2024).
  18. Prompt-to-Prompt Image Editing with Cross-Attention Control. In International Conference on Learning Representations.
  19. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022).
  20. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  21. Video diffusion models. Advances in Neural Information Processing Systems 35 (2022), 8633–8646.
  22. Shadow generation for composite image in real-world scenes. In Proceedings of the AAAI conference on artificial intelligence, Vol. 36. 914–922.
  23. QuantArt: Quantizing image style transfer towards high visual fidelity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5947–5956.
  24. Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501–1510.
  25. Training-free Content Injection using h-space in Diffusion Models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 5151–5161.
  26. Ssh: A self-supervised framework for image harmonization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4832–4841.
  27. Variational diffusion models. Advances in neural information processing systems 34 (2021), 21696–21707.
  28. Gihyun Kwon and Jong Chul Ye. 2022. Clipstyler: Image style transfer with a single text condition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18062–18071.
  29. Diffusion Models Already Have A Semantic Latent Space. In International Conference on Learning Representations.
  30. Jean-Francois Lalonde and Alexei A. Efros. 2007. Using Color Compatibility for Assessing Image Realism. In IEEE International Conference on Computer Vision. 1–8.
  31. Demystifying neural style transfer. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2230–2236.
  32. Magicmix: Semantic mixing with diffusion models. arXiv preprint arXiv:2210.16056 (2022).
  33. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 300–309.
  34. St-gan: Spatial transformer generative adversarial networks for image compositing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9455–9464.
  35. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.
  36. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems 35 (2022), 5775–5787.
  37. Painterly image harmonization using diffusion model. In Proceedings of the 31st ACM International Conference on Multimedia. 233–241.
  38. Tf-icon: Diffusion-based training-free cross-domain image composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2294–2305.
  39. Deep painterly harmonization. In Computer graphics forum, Vol. 37. Wiley Online Library, 95–106.
  40. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11461–11471.
  41. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In International Conference on Learning Representations.
  42. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
  43. Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In International conference on machine learning. PMLR, 8162–8171.
  44. Making images real again: A comprehensive survey on deep image composition. arXiv preprint arXiv:2106.14490 (2021).
  45. Element-Embedded Style Transfer Networks for Style Harmonization.. In British Machine Vision Conference ,BMVC.
  46. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  47. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  48. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22500–22510.
  49. DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing. arXiv preprint arXiv:2306.14435 (2023).
  50. Denoising Diffusion Implicit Models. In International Conference on Learning Representations.
  51. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations.
  52. Deep image harmonization in dual color spaces. In Proceedings of the 31st ACM International Conference on Multimedia. 2159–2167.
  53. Improved ArtGAN for Conditional Synthesis of Natural Image and Artwork. IEEE Transactions on Image Processing 28, 1 (2019), 394–409. https://doi.org/10.1109/TIP.2018.2866698
  54. Deep image harmonization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3789–3797.
  55. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1921–1930.
  56. Attention is all you need. Advances in neural information processing systems 30 (2017).
  57. Gp-gan: Towards realistic high-resolution image blending. In Proceedings of the 27th ACM international conference on multimedia. 2487–2495.
  58. Composite photograph harmonization with complete background cues. In Proceedings of the 30th ACM international conference on multimedia. 2296–2304.
  59. Style Image Harmonization via Global-Local Style Mutual Guided. In Proceedings of the Asian Conference on Computer Vision. 2306–2321.
  60. Paint by example: Exemplar-based image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18381–18391.
  61. Deep image compositing. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 365–374.
  62. Deep image blending. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 231–240.
  63. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
  64. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10146–10156.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets