Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Representative Feature Extraction During Diffusion Process for Sketch Extraction with One Example (2401.04362v1)

Published 9 Jan 2024 in cs.CV, cs.AI, and cs.GR

Abstract: We introduce DiffSketch, a method for generating a variety of stylized sketches from images. Our approach focuses on selecting representative features from the rich semantics of deep features within a pretrained diffusion model. This novel sketch generation method can be trained with one manual drawing. Furthermore, efficient sketch extraction is ensured by distilling a trained generator into a streamlined extractor. We select denoising diffusion features through analysis and integrate these selected features with VAE features to produce sketches. Additionally, we propose a sampling scheme for training models using a conditional generative approach. Through a series of comparisons, we verify that distilled DiffSketch not only outperforms existing state-of-the-art sketch extraction methods but also surpasses diffusion-based stylization methods in the task of extracting sketches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5):898–916, 2010.
  2. Reference based sketch extraction via attention mechanism. ACM Transactions on Graphics (TOG), 41(6):1–16, 2022.
  3. Label-efficient semantic segmentation with diffusion models. arXiv preprint arXiv:2112.03126, 2021.
  4. John Canny. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, (6):679–698, 1986.
  5. Learning to generate line drawings that convey geometry and semantics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7915–7925, 2022.
  6. User-guided deep anime line art colorization with conditional adversarial networks. In Proceedings of the 26th ACM international conference on Multimedia, pages 1536–1544, 2018.
  7. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2):224–227, 1979.
  8. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE, 2009.
  9. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022a.
  10. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022b.
  11. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  12. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  13. Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933.
  14. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711. Springer, 2016.
  15. A style-based generator architecture for generative adversarial networks. In CVPR, 2019.
  16. Slime: Segment like me. arXiv preprint arXiv:2309.03179, 2023.
  17. Tag2pix: Line art colorization using text tag with secat and changing loss. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9056–9065, 2019.
  18. Taebum Kim. Anime sketch colorization pair. https://www.kaggle.com/ktaebum/anime-sketch-colorization-pair, 2018.
  19. Diffusion-based image translation using disentangled style and content representation. In The Eleventh International Conference on Learning Representations, 2023.
  20. The earth mover’s distance is the mallows distance: Some insights from statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, pages 251–256. IEEE, 2001.
  21. Deep extraction of manga structural lines. ACM Transactions on Graphics (SIGGRAPH 2017 issue), 36(4):117:1–117:12, 2017.
  22. Photo-sketching: Inferring contour drawings from images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1403–1412. IEEE, 2019.
  23. Microsoft coco: Common objects in context, 2015.
  24. Deepfacevideoediting: Sketch-based deep editing of face videos. ACM Transactions on Graphics, 41(4):167, 2022.
  25. lllyasviel. sketchkeras. https://github.com/lllyasviel/sketchKeras, 2017.
  26. Diffusion hyperfeatures: Searching through time and space for semantic correspondence. arXiv preprint arXiv:2305.14334, 2023.
  27. Kanti V Mardia. Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3):519–530, 1970.
  28. Kanti V Mardia. Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhyā: The Indian Journal of Statistics, Series B, pages 115–128, 1974.
  29. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, pages 416–423. IEEE, 2001.
  30. General virtual sketching framework for vector line art. ACM Transactions on Graphics (TOG), 40(4):1–14, 2021.
  31. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  32. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  33. Faceshop: Deep sketch-based face image editing. arXiv preprint arXiv:1804.08972, 2018.
  34. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  35. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  36. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  37. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  38. Peter J Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53–65, 1987.
  39. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  40. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  41. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs, 2021.
  42. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294, 2022.
  43. Semi-supervised reference-based sketch extraction using a contrastive learning framework. ACM Transactions on Graphics (TOG), 42(4):1–12, 2023.
  44. Midms: Matching interleaved diffusion models for exemplar-based image translation. arXiv preprint arXiv:2209.11047, 2022.
  45. An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611, 1965.
  46. sharpei pups. 6.5 weeks old sharpei puppies. https://www.youtube.com/watch?v=plIyQg6llp8, 2014. Accessed: 23-11-2023.
  47. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  48. Emergent correspondence from image diffusion. arXiv preprint arXiv:2306.03881, 2023.
  49. TheSaoPauloSeries. São paulo city mini-documentary: (full hd) the são paulo series. https://www.youtube.com/watch?v=A3pBJTTjwCM, 2013. Accessed: 23-11-2023.
  50. Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1921–1930, 2023.
  51. Clipasso: Semantically-aware object sketching. ACM Transactions on Graphics (TOG), 41(4):1–11, 2022.
  52. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
  53. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  54. Stylizing ribbons: Computing surface contours with temporally coherent orientations. IEEE Transactions on Visualization and Computer Graphics, 2023.
  55. Holger Winnemöller. Xdog: advanced image stylization with extended difference-of-gaussians. In Proceedings of the ACM SIGGRAPH/eurographics symposium on non-photorealistic animation and rendering, pages 147–156, 2011.
  56. Xdog: An extended difference-of-gaussians compendium including advanced image stylization. Computers & Graphics, 36(6):740–753, 2012.
  57. Anime2sketch: A sketch extractor for anime arts with deep networks. https://github.com/Mukosame/Anime2Sketch, 2021.
  58. Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision, pages 1395–1403, 2015a.
  59. Holistically-nested edge detection, 2015b.
  60. Open-vocabulary panoptic segmentation with text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2955–2966, 2023.
  61. Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10743–10752, 2019.
  62. Unpaired portrait drawing generation via asymmetric cycle mapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8217–8225, 2020.
  63. Line art colorization with concatenated spatial attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3946–3950, 2021.
  64. A tale of two features: Stable diffusion complements dino for zero-shot semantic correspondence. arXiv preprint arXiv:2305.15347, 2023a.
  65. A level set approach to image segmentation with intensity inhomogeneity. IEEE transactions on cybernetics, 46(2):546–557, 2015.
  66. Two-stage sketch colorization. ACM Transactions on Graphics (TOG), 37(6):1–14, 2018a.
  67. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023b.
  68. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018b.
  69. Mind the gap: Domain gap control for single shot domain adaptation for generative adversarial networks. In International Conference on Learning Representations, 2022.

Summary

We haven't generated a summary for this paper yet.