Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CustomSketching: Sketch Concept Extraction for Sketch-based Image Synthesis and Editing (2402.17624v1)

Published 27 Feb 2024 in cs.CV and cs.GR

Abstract: Personalization techniques for large text-to-image (T2I) models allow users to incorporate new concepts from reference images. However, existing methods primarily rely on textual descriptions, leading to limited control over customized images and failing to support fine-grained and local editing (e.g., shape, pose, and details). In this paper, we identify sketches as an intuitive and versatile representation that can facilitate such control, e.g., contour lines capturing shape information and flow lines representing texture. This motivates us to explore a novel task of sketch concept extraction: given one or more sketch-image pairs, we aim to extract a special sketch concept that bridges the correspondence between the images and sketches, thus enabling sketch-based image synthesis and editing at a fine-grained level. To accomplish this, we introduce CustomSketching, a two-stage framework for extracting novel sketch concepts. Considering that an object can often be depicted by a contour for general shapes and additional strokes for internal details, we introduce a dual-sketch representation to reduce the inherent ambiguity in sketch depiction. We employ a shape loss and a regularization loss to balance fidelity and editability during optimization. Through extensive experiments, a user study, and several applications, we show our method is effective and superior to the adapted baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Clip2stylegan: Unsupervised extraction of stylegan edit directions. In ACM SIGGRAPH 2022 conference proceedings, pages 1–9, 2022.
  2. Blended diffusion for text-driven editing of natural images. In CVPR, pages 18208–18218, 2022.
  3. Break-a-scene: Extracting multiple concepts from a single image. arXiv preprint arXiv:2305.16311, 2023a.
  4. Blended latent diffusion. ACM TOG, 42(4):1–11, 2023b.
  5. Paint by word. arXiv preprint arXiv:2103.10951, 2021.
  6. Large scale gan training for high fidelity natural image synthesis. In ICLR, 2018.
  7. Instructpix2pix: Learning to follow image editing instructions. In CVPR, pages 18392–18402, 2023.
  8. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In ICCV, pages 22560–22570, 2023.
  9. Emerging properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021.
  10. Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models. ACM TOG, 42(4):1–10, 2023.
  11. Deepfacedrawing: Deep generation of face images from sketches. ACM TOG, 39(4):72–1, 2020.
  12. Deepfaceediting: deep face generation and editing with disentangled geometry and appearance control. ACM TOG, 40(4):1–15, 2021.
  13. Sketchygan: Towards diverse and realistic sketch to image synthesis. In CVPR, pages 9416–9425, 2018.
  14. Subject-driven text-to-image generation via apprenticeship learning. 2023a.
  15. Anydoor: Zero-shot object-level image customization. arXiv preprint arXiv:2307.09481, 2023b.
  16. Adaptively-realistic image generation from stroke and sketch with diffusion model. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4054–4062, 2023.
  17. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
  18. Diffedit: Diffusion-based semantic image editing with mask guidance. In ICLR, 2022.
  19. Vqgan-clip: Open domain image generation and editing with natural language guidance. In ECCV, pages 88–105. Springer, 2022.
  20. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  21. An image is worth one word: Personalizing text-to-image generation using textual inversion. In ICLR, 2022a.
  22. Stylegan-nada: Clip-guided domain adaptation of image generators. ACM TOG, 41(4):1–13, 2022b.
  23. Encoder-based domain tuning for fast personalization of text-to-image models. ACM TOG, 42(4):1–13, 2023.
  24. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  25. Prompt-to-prompt image editing with cross-attention control. In ICLR, 2022.
  26. Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
  27. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  28. Taming encoder for zero fine-tuning image customization with text-to-image diffusion models. arXiv preprint arXiv:2304.02642, 2023.
  29. A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401–4410, 2019.
  30. Imagic: Text-based real image editing with diffusion models. In CVPR, pages 6007–6017, 2023.
  31. Multi-concept customization of text-to-image diffusion. In CVPR, pages 1931–1941, 2023.
  32. Deepfacevideoediting: sketch-based deep editing of face videos. ACM TOG, 41(4):167, 2022.
  33. Controllable person image synthesis with attribute-decomposed gan. In CVPR, pages 5084–5093, 2020.
  34. Self-distilled stylegan: Towards generation from internet photos. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022.
  35. Null-text inversion for editing real images using guided diffusion models. In CVPR, pages 6038–6047, 2023.
  36. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453, 2023.
  37. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, pages 16784–16804. PMLR, 2022.
  38. Mystyle: A personalized generative prior. ACM TOG, 41(6):1–10, 2022.
  39. Styleclip: Text-driven manipulation of stylegan imagery. In ICCV, pages 2085–2094, 2021.
  40. Localizing object-level shape variations with text-to-image diffusion models. 2023.
  41. Difffacesketch: High-fidelity face image synthesis with sketch-guided latent diffusion model. arXiv preprint arXiv:2302.06908, 2023.
  42. Faceshop: deep sketch-based face image editing. ACM Transactions on Graphics (TOG), 37(4):1–13, 2018.
  43. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  44. Zero-shot text-to-image generation. In ICML, pages 8821–8831. PMLR, 2021.
  45. Generative adversarial text to image synthesis. In International conference on machine learning, pages 1060–1069. PMLR, 2016.
  46. Encoding in style: a stylegan encoder for image-to-image translation. In CVPR, pages 2287–2296, 2021.
  47. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  48. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pages 22500–22510, 2023.
  49. Photorealistic text-to-image diffusion models with deep language understanding. NIPS, 35:36479–36494, 2022.
  50. Scribbler: Controlling deep image synthesis with sketch and color. In CVPR, pages 5400–5409, 2017.
  51. Instantbooth: Personalized text-to-image generation without test-time finetuning. arXiv preprint arXiv:2304.03411, 2023.
  52. Denoising diffusion implicit models. In ICLR, 2020.
  53. Generative modeling by estimating gradients of the data distribution. NeurIPS, 32, 2019.
  54. Plug-and-play diffusion features for text-driven image-to-image translation. In CVPR, pages 1921–1930, 2023.
  55. Unitune: Text-driven image editing by fine tuning a diffusion model on a single image. ACM TOG, 42(4):1–10, 2023.
  56. Attention is all you need. NIPS, 30, 2017.
  57. Image shape manipulation from a single augmented training sample. In ICCV, pages 13769–13778, 2021.
  58. Sketch-guided text-to-image diffusion models. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023.
  59. Diffsketching: Sketch control image synthesis with diffusion models. 2023a.
  60. Imagen editor and editbench: Advancing and evaluating text-guided image inpainting. In CVPR, pages 18359–18369, 2023b.
  61. Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation. 2023.
  62. Tedigan: Text-guided diverse face image generation and manipulation. In CVPR, pages 2256–2265, 2021.
  63. Sketchhairsalon: deep sketch-based hair image synthesis. ACM TOG, 40(6):1–16, 2021.
  64. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In CVPR, pages 1316–1324, 2018.
  65. Draw2edit: Mask-free sketch-guided image manipulation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 7205–7215, 2023.
  66. Sketchedit: Mask-free local image manipulation with partial sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5951–5961, 2022.
  67. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV, pages 5907–5915, 2017.
  68. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE TPAMI, 41(8):1947–1962, 2018a.
  69. Cross-modal contrastive learning for text-to-image generation. In CVPR, pages 833–842, 2021.
  70. Adding conditional control to text-to-image diffusion models. In ICCV, pages 3836–3847, 2023a.
  71. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pages 586–595, 2018b.
  72. Real-world image variation by aligning diffusion inversion chain. NIPS, 2023b.
  73. Language-based colorization of scene sketches. ACM TOG, 38(6):1–16, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Chufeng Xiao (7 papers)
  2. Hongbo Fu (67 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets