Stylized Face Sketch Extraction via Generative Prior with Limited Data (2403.11263v1)
Abstract: Facial sketches are both a concise way of showing the identity of a person and a means to express artistic intention. While a few techniques have recently emerged that allow sketches to be extracted in different styles, they typically rely on a large amount of data that is difficult to obtain. Here, we propose StyleSketch, a method for extracting high-resolution stylized sketches from a face image. Using the rich semantics of the deep features from a pretrained StyleGAN, we are able to train a sketch generator with 16 pairs of face and the corresponding sketch images. The sketch generator utilizes part-based losses with two-stage learning for fast convergence during training for high-quality sketch extraction. Through a set of comparisons, we show that StyleSketch outperforms existing state-of-the-art sketch extraction methods and few-shot image adaptation methods for the task of extracting high-resolution abstract face sketches. We further demonstrate the versatility of StyleSketch by extending its use to other domains and explore the possibility of semantic editing. The project page can be found in https://kwanyun.github.io/stylesketch_project.
- Image2stylegan++: How to edit the embedded images? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 8296–8305.
- Reference based sketch extraction via attention mechanism. ACM Transactions on Graphics (TOG) 41, 6 (2022), 1–16.
- Labels4free: Unsupervised segmentation using stylegan. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 13970–13979.
- Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Trans. Graph. 40, 3 (2021).
- Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).
- Style and abstraction in portrait sketching. ACM Transactions on Graphics (TOG) 32, 4 (2013), 1–12.
- Canny J.: A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, 6 (1986), 679–698.
- Learning to generate line drawings that convey geometry and semantics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 7915–7925.
- CELSYS: Clip studio paint. https://www.clipstudio.net/, 2012.
- Chong M. J., Forsyth D.: Jojogan: One shot face stylization. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI (2022), Springer, pp. 128–152.
- Joint distribution optimal transportation for domain adaptation. Advances in neural information processing systems 30 (2017).
- Deepfacedrawing: Deep generation of face images from sketches. ACM Transactions on Graphics (TOG) 39, 4 (2020), 72–1.
- Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 8188–8197.
- Muse: Text-to-image generation via masked generative transformers. arXiv preprint arXiv:2301.00704 (2023).
- Facial-sketch synthesis: A new challenge. Machine Intelligence Research 19, 4 (2022), 257–287.
- Stylevideogan: A temporal generative model using a pretrained stylegan, 2021.
- A brief review of domain adaptation. Advances in Data Science and Information Engineering: Proceedings from ICDATA 2020 and IKE 2020 (2021), 877–894.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022).
- Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–13.
- Polarity sampling: Quality and diversity control of pre-trained generative networks via singular values. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 10641–10650.
- Ganspace: Discovering interpretable gan controls. Advances in Neural Information Processing Systems 33 (2020), 9841–9850.
- Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
- Deep transfer network with joint distribution adaptation: A new intelligent fault diagnosis framework for industry application. ISA transactions 97 (2020), 269–281.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).
- Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2017), pp. 1125–1134.
- Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14 (2016), Springer, pp. 694–711.
- Training generative adversarial networks with limited data. In NIPS (2020).
- Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems 33 (2020), 12104–12114.
- Dynagan: Dynamic few-shot adaptation of gans to multiple domains. In SIGGRAPH Asia 2022 Conference Papers (2022), pp. 1–8.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference (2019).
- Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference (2020).
- Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 8110–8119.
- Deepfacevideoediting: Sketch-based deep editing of face videos. ACM Transactions on Graphics 41, 4 (2022), 167.
- Linestofacephoto: Face photo generation from lines with conditional self-attention generative adversarial networks. In Proceedings of the 27th ACM International Conference on Multimedia (2019), pp. 2323–2331.
- Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 5801–5810.
- Bigdatasetgan: Synthesizing imagenet with pixel-wise annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 21330–21340.
- lllyasviel: sketchkeras. https://github.com/lllyasviel/sketchKeras, 2017.
- Photo-sketching: Inferring contour drawings from images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (2019), IEEE, pp. 1403–1412.
- Deep extraction of manga structural lines. ACM Transactions on Graphics (TOG) 36, 4 (July 2017), 117:1–117:12.
- Anycost gans for interactive image synthesis and editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 14986–14996.
- Few-shot image generation with elastic weight consolidation. arXiv preprint arXiv:2012.02780 (2020).
- Freeze the discriminator: a simple baseline for fine-tuning gans. arXiv preprint arXiv:2002.10964 (2020).
- Noguchi A., Harada T.: Image generation from small datasets via batch statistics adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), pp. 2750–2758.
- Few-shot image generation via cross-domain correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 10743–10752.
- Faceshop: Deep sketch-based face image editing. arXiv preprint arXiv:1804.08972 (2018).
- Gan-supervised dense visual alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 13470–13481.
- Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 2287–2296.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 10684–10695.
- Learning transferable visual models from natural language supervision. In International conference on machine learning (2021), PMLR, pp. 8748–8763.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 22500–22510.
- Semi-supervised reference-based sketch extraction using a contrastive learning framework. ACM Transactions on Graphics (TOG) 42, 4 (2023), 1–12.
- Interpreting the latent space of gans for semantic face editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 9243–9252.
- Midms: Matching interleaved diffusion models for exemplar-based image translation. arXiv preprint arXiv:2209.11047 (2022).
- Styleportraitvideo: Editing portrait videos with expression optimization. 165–175.
- Styledrop: Text-to-image generation in any style. arXiv preprint arXiv:2306.00983 (2023).
- Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH 2022 Conference Proceedings (2022), pp. 1–10.
- Learning to simplify: fully convolutional networks for rough sketch cleanup. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1–11.
- Simonyan K., Zisserman A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
- Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–14.
- Stitch it in time: Gan-based facial editing of real videos. In SIGGRAPH Asia 2022 Conference Papers (2022), pp. 1–9.
- A good image generator is what you need for high-resolution video synthesis. In International Conference on Learning Representations (2021).
- Stylegan2 distillation for feed-forward image manipulation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16 (2020), Springer, pp. 170–186.
- Clipasso: Semantically-aware object sketching. ACM Trans. Graph. 41, 4 (jul 2022).
- Minegan: effective knowledge transfer from gans to target domains with few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 9332–9341.
- Winnemöller H.: Xdog: advanced image stylization with extended difference-of-gaussians. In Proceedings of the ACM SIGGRAPH/eurographics symposium on non-photorealistic animation and rendering (2011), pp. 147–156.
- Reffacenet: Reference-based face image generation from line art drawings. Neurocomputing 488 (2022), 154–167.
- Wang X., Tang X.: Face photo-sketch synthesis and recognition. vol. 31, IEEE, pp. 1955–1967.
- Diffusion-gan: Training gans with diffusion. arXiv preprint arXiv:2206.02262 (2022).
- Temporally consistent semantic video editing. arXiv preprint arXiv: 2206.10590 (2022).
- Few shot generative model adaption via relaxed spatial structural alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 11204–11213.
- Anime2sketch: A sketch extractor for anime arts with deep networks. https://github.com/Mukosame/Anime2Sketch, 2021.
- Xie S., Tu Z.: Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1395–1403.
- Perceptual-aware sketch simplification based on integrated vgg layers. IEEE transactions on visualization and computer graphics 27, 1 (2019), 178–189.
- Pastiche master: exemplar-based high-resolution portrait style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 7693–7702.
- Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 10743–10752.
- Unpaired portrait drawing generation via asymmetric cycle mapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 8217–8225.
- A latent transformer for disentangled face editing in images and videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 13789–13798.
- Feature-style encoder for style-based gan inversion. arXiv preprint arXiv:2202.02183 (2022).
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).
- Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV) (2018), pp. 325–341.
- Line drawings for face portraits from photos using global and local structure based gans. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 10 (2020), 3462–3475.
- Zhang L., Agrawala M.: Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543 (2023).
- Mind the gap: Domain gap control for single shot domain adaptation for generative adversarial networks. arXiv preprint arXiv:2110.08398 (2021).
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 586–595.
- Datasetgan: Efficient labeled data factory with minimal human effort. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 10145–10155.
- Zhang W., Wu D.: Discriminative joint probability maximum mean discrepancy (djp-mmd) for domain adaptation. In 2020 international joint conference on neural networks (IJCNN) (2020), IEEE, pp. 1–8.
- Coupled information-theoretic encoding for face photo-sketch recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2011), pp. 513–520.