Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FreePIH: Training-Free Painterly Image Harmonization with Diffusion Model (2311.14926v1)

Published 25 Nov 2023 in cs.CV and cs.AI

Abstract: This paper provides an efficient training-free painterly image harmonization (PIH) method, dubbed FreePIH, that leverages only a pre-trained diffusion model to achieve state-of-the-art harmonization results. Unlike existing methods that require either training auxiliary networks or fine-tuning a large pre-trained backbone, or both, to harmonize a foreground object with a painterly-style background image, our FreePIH tames the denoising process as a plug-in module for foreground image style transfer. Specifically, we find that the very last few steps of the denoising (i.e., generation) process strongly correspond to the stylistic information of images, and based on this, we propose to augment the latent features of both the foreground and background images with Gaussians for a direct denoising-based harmonization. To guarantee the fidelity of the harmonized image, we make use of multi-scale features to enforce the consistency of the content and stability of the foreground objects in the latent space, and meanwhile, aligning both fore-/back-grounds with the same style. Moreover, to accommodate the generation with more structural and textural details, we further integrate text prompts to attend to the latent features, hence improving the generation quality. Quantitative and qualitative evaluations on COCO and LAION 5B datasets demonstrate that our method can surpass representative baselines by large margins.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Blended Diffusion for Text-driven Editing of Natural Images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, 18187–18197. IEEE.
  2. Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, 13910–13920. IEEE.
  3. Painterly Image Harmonization in Dual Domains. In Williams, B.; Chen, Y.; and Neville, J., eds., Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, 268–276. AAAI Press.
  4. High-Resolution Image Harmonization via Collaborative Dual Transformations. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, 18449–18458. IEEE.
  5. DoveNet: Deep Image Harmonization via Domain Verification. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, 8391–8400. Computer Vision Foundation / IEEE.
  6. Improving the Harmony of the Composite Image by Spatial-Separated Attention Module. IEEE Trans. Image Process., 29: 4759–4771.
  7. On Analyzing Generative and Denoising Capabilities of Diffusion-based Deep Generative Models. In NeurIPS.
  8. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  9. Image Style Transfer Using Convolutional Neural Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, 2414–2423. IEEE Computer Society.
  10. SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, 19678–19687. IEEE.
  11. Prompt-to-Prompt Image Editing with Cross-Attention Control. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  12. Denoising Diffusion Probabilistic Models. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  13. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Leibe, B.; Matas, J.; Sebe, N.; and Welling, M., eds., Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II, volume 9906 of Lecture Notes in Computer Science, 694–711. Springer.
  14. Microsoft COCO: Common Objects in Context. In Fleet, D. J.; Pajdla, T.; Schiele, B.; and Tuytelaars, T., eds., Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, volume 8693 of Lecture Notes in Computer Science, 740–755. Springer.
  15. Region-Aware Adaptive Instance Normalization for Image Harmonization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 9361–9370. Computer Vision Foundation / IEEE.
  16. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In NeurIPS.
  17. Deep Painterly Harmonization. Comput. Graph. Forum, 37(4): 95–106.
  18. Comparison of Four Subjective Methods for Image Quality Assessment. Comput. Graph. Forum, 31(8): 2478–2491.
  19. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  20. Poisson image editing. ACM Trans. Graph., 22(3): 313–318.
  21. Learning Transferable Visual Models From Natural Language Supervision. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, 8748–8763. PMLR.
  22. Hierarchical Text-Conditional Image Generation with CLIP Latents. CoRR, abs/2204.06125.
  23. High-Resolution Image Synthesis with Latent Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, 10674–10685. IEEE.
  24. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. CoRR, abs/2208.12242.
  25. LAION-5B: An open large-scale dataset for training next generation image-text models. In NeurIPS.
  26. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Bengio, Y.; and LeCun, Y., eds., 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  27. Denoising Diffusion Implicit Models. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  28. Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses. CoRR, abs/1701.08893.
  29. Deep Image Blending. In IEEE Winter Conference on Applications of Computer Vision, WACV 2020, Snowmass Village, CO, USA, March 1-5, 2020, 231–240. IEEE.
Citations (8)

Summary

We haven't generated a summary for this paper yet.