Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

It's All About Your Sketch: Democratising Sketch Control in Diffusion Models (2403.07234v2)

Published 12 Mar 2024 in cs.CV

Abstract: This paper unravels the potential of sketches for diffusion models, addressing the deceptive promise of direct sketch control in generative AI. We importantly democratise the process, enabling amateur sketches to generate precise images, living up to the commitment of "what you sketch is what you get". A pilot study underscores the necessity, revealing that deformities in existing models stem from spatial-conditioning. To rectify this, we propose an abstraction-aware framework, utilising a sketch adapter, adaptive time-step sampling, and discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model, working synergistically to reinforce fine-grained sketch-photo association. Our approach operates seamlessly during inference without the need for textual prompts; a simple, rough sketch akin to what you and I can create suffices! We welcome everyone to examine results presented in the paper and its supplementary. Contributions include democratising sketch control, introducing an abstraction-aware framework, and leveraging discriminative guidance, validated through extensive experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. Blended Diffusion for Text-driven Editing of Natural Images. In CVPR, 2022.
  2. Label-Efficient Semantic Segmentation with Diffusion Models. In ICLR, 2021.
  3. More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval. In CVPR, 2021.
  4. Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches. In CVPR, 2022a.
  5. Sketching Without Worrying: Noise-Tolerant Sketch-Based Image Retrieval. In CVPR, 2022b.
  6. Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings. In CVPR, 2023.
  7. John Canny. A Computational Approach to Edge Detection. IEEE TPAMI, 1986.
  8. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE TPAMI, 2019.
  9. Informative Drawings: Learning to generate line drawings that convey geometry and semantics. In CVPR, 2022.
  10. Partially Does It: Towards Scene-Level FG-SBIR With Partial Input. In CVPR, 2022a.
  11. Garment ideation: Iterative view-aware sketch-based garment modeling. In 3DV, 2022b.
  12. SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text. In CVPR, 2023a.
  13. What Can Human Sketches Do for Object Detection? In CVPR, 2023b.
  14. Medical diffusion on a budget: textual inversion for medical image generation. arXiv preprint arXiv:2303.13430, 2023.
  15. Diffusion Models Beat GANs on Image Synthesis. In NeurIPS, 2021.
  16. How do humans sketch objects? ACM TOG, 2012.
  17. Expressive Text-to-Image Generation with Rich Text. In ICCV, 2023.
  18. Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation. In CVPR, 2019.
  19. A Neural Representation of Sketch Drawings. In ICLR, 2017.
  20. Cogs: Controllable generation and search from sketch and style. In ECCV, 2022.
  21. Prompt-to-Prompt Image Editing with Cross Attention Control. In ICLR, 2022.
  22. Aaron Hertzmann. Why Do Line Drawings Work? A Realism Hypothesis. Perception, 2020.
  23. Classifier-Free Diffusion Guidance. arXiv preprint arXiv:2207.12598, 2022.
  24. Denoising Diffusion Probabilistic Models. In NeurIPS, 2020.
  25. Collaborative Diffusion for Multi-Modal Face Generation and Editing. In CVPR, 2023a.
  26. ReVersion: Diffusion-Based Relation Inversion from Images. arXiv preprint arXiv:2303.13495, 2023b.
  27. AniFaceDrawing: Anime Portrait Exploration during Your Sketching. arXiv preprint arXiv:2306.07476, 2023c.
  28. Study of Rating Scales for Subjective Quality Assessment of High-Definition Video. IEEE TBC, 2010.
  29. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR, 2017.
  30. A Style-Based Generator Architecture for Generative Adversarial Networks. In CVPR, 2019.
  31. Imagic: Text-Based Real Image Editing with Diffusion Models. In CVPR, 2023.
  32. MaPLe: Multi-modal Prompt Learning. In CVPR, 2023.
  33. Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Clean Images. In NeurIPS, 2021.
  34. Sketch-based Medical Image Retrieval. arXiv preprint arXiv:2303.03633, 2023.
  35. Picture that Sketch: Photorealistic Image Generation from Abstract Sketches. In CVPR, 2023.
  36. You’ll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval. In CVPR, 2024a.
  37. How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval? In CVPR, 2024b.
  38. Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers. In CVPR, 2024c.
  39. Diffusion-based Image Translation using Disentangled Style and Content Representation. In ICLR, 2023.
  40. The Role of ImageNet Classes in Frechèt Inception Distance. In ICLR, 2023.
  41. Lambda Labs. Stable Diffusion Image Variations, 2022.
  42. Your Diffusion Model is Secretly a Zero-Shot Classifier. arXiv preprint arXiv:2303.16203, 2023.
  43. Free2CAD: Parsing freehand drawings into CAD commands. ACM TOG, 2022a.
  44. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In ICML, 2022b.
  45. Foldsketch: Enriching garments with physically reproducible folds. ACM TOG, 2018.
  46. Unsupervised Sketch-to-Photo Synthesis. In ECCV, 2020.
  47. Decoupled Weight Decay Regularization. In ICLR, 2019.
  48. Image Generation from Sketch Constraint Using Contextual GAN. In ECCV, 2018.
  49. Structure-Aware 3D VR Sketch to 3D Shape Retrieval. In 3DV, 2022.
  50. 3D VR Sketch Guided 3D Shape Prototyping and Exploration. In ICCV, 2023.
  51. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In ICLR, 2021a.
  52. MagFace: A Universal Representation for Face Recognition and Quality Assessment. In CVPR, 2021b.
  53. SKED: Sketch-guided Text-based 3D Editing. In CVPR, 2023.
  54. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models. arXiv preprint arXiv:2302.08453, 2023.
  55. Learning Transferable Visual Models From Natural Language Supervision. In ICML, 2021.
  56. Zero-Shot Text-to-Image Generation. In ICML, 2021.
  57. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125, 2022.
  58. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE TPAMI, 2022.
  59. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. In CVPR, 2021.
  60. High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR, 2022.
  61. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI, 2015.
  62. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. In CVPR, 2023.
  63. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In NeurIPS, 2022.
  64. StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval. In CVPR, 2021.
  65. CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not. In CVPR, 2023a.
  66. Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR. In CVPR, 2023b.
  67. Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval. In CVPR, 2023.
  68. The sketchy database: learning to retrieve badly drawn bunnies. ACM TOG, 2016.
  69. A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch. In ECCV, 2022.
  70. Discriminative Class Tokens for Text-to-Image Diffusion Models. In ICCV, 2023.
  71. Bridging the Gap: Fine-to-Coarse Sketch Interpolation Network for High-Quality Animation Sketch Inbetweening. arXiv preprint arXiv:2308.13273, 2023.
  72. A Method for Animating Children’s Drawings of the Human Figure. ACM TOG, 2023.
  73. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In ICML, 2015.
  74. Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma. In BMVC, 2017.
  75. Pixel Difference Networks for Efficient Edge Detection. In ICCV, 2021.
  76. Rethinking the Inception Architecture for Computer Vision. In CVPR, 2016.
  77. Emergent Correspondence from Image Diffusion. arXiv preprint arXiv:2306.03881, 2023.
  78. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation. In CVPR, 2023.
  79. Attention is All you Need. In NeurIPS, 2017.
  80. Sketch-Guided Text-to-Image Diffusion Models. In ACM SIGGRAPH, 2023.
  81. Pretraining is All You Need for Image-to-Image Translation. arXiv preprint arXiv:2205.12952, 2022.
  82. Seamless manga inpainting with semantics awareness. ACM TOG, 2021.
  83. Holistically-Nested Edge Detection. In ICCV, 2015.
  84. Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models. In CVPR, 2023.
  85. Paint by Example: Exemplar-based Image Editing with Diffusion Models. In CVPR, 2023.
  86. SketchAA: Abstract Representation for Abstract Sketches. In ICCV, 2021.
  87. Finding Badly Drawn Bunnies. In CVPR, 2022.
  88. Piecewise-smooth surface fitting onto unstructured 3D sketches. ACM TOG, 2022.
  89. Sketch Me That Shoe. In CVPR, 2016.
  90. Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV, 2023.
  91. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In ICCV, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Subhadeep Koley (21 papers)
  2. Ayan Kumar Bhunia (63 papers)
  3. Deeptanshu Sekhri (1 paper)
  4. Aneeshan Sain (40 papers)
  5. Pinaki Nath Chowdhury (37 papers)
  6. Tao Xiang (324 papers)
  7. Yi-Zhe Song (120 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com