A Framework for Portrait Stylization with Skin-Tone Awareness and Nudity Identification (2403.14264v1)
Abstract: Portrait stylization is a challenging task involving the transformation of an input portrait image into a specific style while preserving its inherent characteristics. The recent introduction of Stable Diffusion (SD) has significantly improved the quality of outcomes in this field. However, a practical stylization framework that can effectively filter harmful input content and preserve the distinct characteristics of an input, such as skin-tone, while maintaining the quality of stylization remains lacking. These challenges have hindered the wide deployment of such a framework. To address these issues, this study proposes a portrait stylization framework that incorporates a nudity content identification module (NCIM) and a skin-tone-aware portrait stylization module (STAPSM). In experiments, NCIM showed good performance in enhancing explicit content filtering, and STAPSM accurately represented a diverse range of skin tones. Our proposed framework has been successfully deployed in practice, and it has effectively satisfied critical requirements of real-world applications.
- “Webtoonme: A data-centric approach for full-body portrait stylization,” in SIGGRAPH Asia 2022 Technical Communications, pp. 1–4. 2022.
- “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
- “Resolution dependent gan interpolation for controllable image synthesis between domains,” arXiv preprint arXiv:2010.05334, 2020.
- “Cross-domain style mixing for face cartoonization,” arXiv preprint arXiv:2205.12450, 2022.
- “Agilegan: stylizing portraits by inversion-consistent transfer learning,” ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1–13, 2021.
- “Dct-net: domain-calibrated translation for portrait stylization,” ACM Transactions on Graphics (TOG), vol. 41, no. 4, pp. 1–9, 2022.
- “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684–10695.
- “Hierarchical text-conditional image generation with clip latents,” arXiv preprint arXiv:2204.06125, vol. 1, no. 2, pp. 3, 2022.
- “Midjourney, 2022,” Midjourney Home Page. Available online: https://www.midjourney.com/home/ (accessed on August 2, 2022).
- “Adding conditional control to text-to-image diffusion models,” arXiv preprint arXiv:2302.05543, 2023.
- “T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models,” arXiv preprint arXiv:2302.08453, 2023.
- “Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models,” 2023.
- “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22500–22510.
- “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
- “K-adapter: Infusing knowledge into pre-trained models with adapters,” arXiv preprint arXiv:2002.01808, 2020.
- “An image is worth one word: Personalizing text-to-image generation using textual inversion,” arXiv preprint arXiv:2208.01618, 2022.
- “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
- “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in International Conference on Machine Learning. PMLR, 2022, pp. 12888–12900.
- deepghs, “Huggingface repository: nsfw_detect,” https://huggingface.co/datasets/deepghs/nsfw_detect.
- “Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation,” International Journal of Computer Vision, vol. 129, pp. 3051–3068, 2021.
- NudeNet, “Github repository: Nudenet classifier,” https://github.com/notAI-tech/NudeNet.
- CompVis, “Huggingface repository: stable-diffusion-safety-checker,” https://huggingface.co/CompVis/stable-diffusion-safety-checker.
- LAION-AI, “Github repository: Clip-based-nsfw-detector.,” https://github.com/LAION-AI/CLIP-based-NSFW-Detector.
- thisandagain, “Github repository: washyourmouthoutwithsoap,” https://github.com/thisandagain/washyourmouthoutwithsoap.
- Kohya S., “Github repository: sd-scripts,” https://github.com/kohya-ss/sd-scripts, 2023.
- Linaqruf, “Huggingface repository: anything-v3.0,” https://huggingface.co/Linaqruf/anything-v3.0.
- “Diffusers: State-of-the-art diffusion models,” https://github.com/huggingface/diffusers, 2022.
- Seung-Woo Lee, “Naver webtoon’s toon filter creates 20 mn ai-converted images,” The Korea Economic Daily, 2023.