Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Framework for Portrait Stylization with Skin-Tone Awareness and Nudity Identification (2403.14264v1)

Published 21 Mar 2024 in cs.CV and cs.AI

Abstract: Portrait stylization is a challenging task involving the transformation of an input portrait image into a specific style while preserving its inherent characteristics. The recent introduction of Stable Diffusion (SD) has significantly improved the quality of outcomes in this field. However, a practical stylization framework that can effectively filter harmful input content and preserve the distinct characteristics of an input, such as skin-tone, while maintaining the quality of stylization remains lacking. These challenges have hindered the wide deployment of such a framework. To address these issues, this study proposes a portrait stylization framework that incorporates a nudity content identification module (NCIM) and a skin-tone-aware portrait stylization module (STAPSM). In experiments, NCIM showed good performance in enhancing explicit content filtering, and STAPSM accurately represented a diverse range of skin tones. Our proposed framework has been successfully deployed in practice, and it has effectively satisfied critical requirements of real-world applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. “Webtoonme: A data-centric approach for full-body portrait stylization,” in SIGGRAPH Asia 2022 Technical Communications, pp. 1–4. 2022.
  2. “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
  3. “Resolution dependent gan interpolation for controllable image synthesis between domains,” arXiv preprint arXiv:2010.05334, 2020.
  4. “Cross-domain style mixing for face cartoonization,” arXiv preprint arXiv:2205.12450, 2022.
  5. “Agilegan: stylizing portraits by inversion-consistent transfer learning,” ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1–13, 2021.
  6. “Dct-net: domain-calibrated translation for portrait stylization,” ACM Transactions on Graphics (TOG), vol. 41, no. 4, pp. 1–9, 2022.
  7. “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684–10695.
  8. “Hierarchical text-conditional image generation with clip latents,” arXiv preprint arXiv:2204.06125, vol. 1, no. 2, pp. 3, 2022.
  9. “Midjourney, 2022,” Midjourney Home Page. Available online: https://www.midjourney.com/home/ (accessed on August 2, 2022).
  10. “Adding conditional control to text-to-image diffusion models,” arXiv preprint arXiv:2302.05543, 2023.
  11. “T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models,” arXiv preprint arXiv:2302.08453, 2023.
  12. “Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models,” 2023.
  13. “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22500–22510.
  14. “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
  15. “K-adapter: Infusing knowledge into pre-trained models with adapters,” arXiv preprint arXiv:2002.01808, 2020.
  16. “An image is worth one word: Personalizing text-to-image generation using textual inversion,” arXiv preprint arXiv:2208.01618, 2022.
  17. “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
  18. “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in International Conference on Machine Learning. PMLR, 2022, pp. 12888–12900.
  19. deepghs, “Huggingface repository: nsfw_detect,” https://huggingface.co/datasets/deepghs/nsfw_detect.
  20. “Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation,” International Journal of Computer Vision, vol. 129, pp. 3051–3068, 2021.
  21. NudeNet, “Github repository: Nudenet classifier,” https://github.com/notAI-tech/NudeNet.
  22. CompVis, “Huggingface repository: stable-diffusion-safety-checker,” https://huggingface.co/CompVis/stable-diffusion-safety-checker.
  23. LAION-AI, “Github repository: Clip-based-nsfw-detector.,” https://github.com/LAION-AI/CLIP-based-NSFW-Detector.
  24. thisandagain, “Github repository: washyourmouthoutwithsoap,” https://github.com/thisandagain/washyourmouthoutwithsoap.
  25. Kohya S., “Github repository: sd-scripts,” https://github.com/kohya-ss/sd-scripts, 2023.
  26. Linaqruf, “Huggingface repository: anything-v3.0,” https://huggingface.co/Linaqruf/anything-v3.0.
  27. “Diffusers: State-of-the-art diffusion models,” https://github.com/huggingface/diffusers, 2022.
  28. Seung-Woo Lee, “Naver webtoon’s toon filter creates 20 mn ai-converted images,” The Korea Economic Daily, 2023.

Summary

  • The paper presents a dual-module framework that preserves authentic skin tones while filtering explicit content.
  • It employs a two-stage image-to-image process with skin-tone spectrum augmentation to enhance stylization quality.
  • The approach integrates CLIP and BLIP methods in its nudity identification module to reliably enforce ethical content standards.

A Framework Combining Portrait Stylization with Skin-Tone Awareness and Nudity Content Identification

Introduction to the Framework

Generative AI, notably through models like Stable Diffusion (SD), has significantly elevated the capabilities in the domain of portrait stylization, transforming input images into distinctive styles whilst aiming to maintain inherent characteristics such as skin tone. However, incorporating effective filters to eliminate harmful content and preserving skin-tone characteristics without compromising stylization quality has been a challenge. This paper introduces a novel framework addressing these concerns by integrating a nudity content identification module (NCIM) and a skin-tone-aware portrait stylization module (STAPSM). The framework has demonstrated efficiency in retaining a broad spectrum of skin tones and enhancing explicit content filtering, making it suitable for real-world applications.

Core Components of the Framework

Skin-Tone-Aware Portrait Stylization Module (STAPSM)

STAPSM leverages a fine-tuning phase with skin-tone spectrum augmentation and a progressive inference phase, aiming to maintain the input's skin tones while achieving high-quality stylization. The process involves a skin-tone spectrum augmentation that refines the training dataset to ensure a diverse representation of skin tones. It utilizes a two-stage image-to-image (I2I) translation approach, applying different denoising strengths and image conditions, thus preserving both the skin tone and the unique features of various IPs.

Nudity Content Identification Module (NCIM)

NCIM combines the capabilities of the CLIP embedding classifier and BLIP caption-based keyword matching to filter harmful content effectively. By analyzing the biases and limitations in existing nudity filters, NCIM improves reliability in preventing the inadvertent generation or sharing of explicit content. This module represents an advancement in ensuring that generated images in applications align with content and ethical standards.

Empirical Evaluation

The framework demonstrated superior performance in preserving the diverse range of skin tones, significantly outperforming existing methods in qualitative assessments and user studies among professionals in the Webtoon industry. Moreover, the NCIM showcased remarkable accuracy and reliability in identifying and filtering nudity content, highlighting the efficacy of combining embedding-based classifiers with keyword-based matching techniques.

Practical Implications and Future Outlook

The deployment of this framework in a real-world portrait stylization service has generated over 2 million images across several popular Webtoon IPs, receiving positive feedback from users for its skin-tone representation capabilities. The robustness of the NCIM has effectively deterred the generation of explicit content, thereby safeguarding the value of IPs. This paper not only addresses existing limitations in portrait stylization technologies but also sets a foundation for future research focused on enhancing generative AI's ethical use and inclusivity.

Conclusion

This paper presents a comprehensive framework that innovatively combines skin-tone-aware portrait stylization with effective nudity content identification. It offers a significant step toward ethical and inclusive generative AI applications, particularly in settings where preserving user characteristics and preventing harmful content generation are critical. The framework's deployment and its success in real-world applications underscore its potential to shape future directions in AI-powered content creation, with an emphasis on ethical considerations and diversity.