Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models (2403.13250v1)
Abstract: Pornographic content occurring in human-machine interaction dialogues can cause severe side effects for users in open-domain dialogue systems. However, research on detecting pornographic language within human-machine interaction dialogues is an important subject that is rarely studied. To advance in this direction, we introduce CensorChat, a dialogue monitoring dataset aimed at detecting whether the dialogue session contains pornographic content. To this end, we collect real-life human-machine interaction dialogues in the wild and break them down into single utterances and single-turn dialogues, with the last utterance spoken by the chatbot. We propose utilizing knowledge distillation of LLMs to annotate the dataset. Specifically, first, the raw dataset is annotated by four open-source LLMs, with the majority vote determining the label. Second, we use ChatGPT to update the empty label from the first step. Third, to ensure the quality of the validation and test sets, we utilize GPT-4 for label calibration. If the current label does not match the one generated by GPT-4, we employ a self-criticism strategy to verify its correctness. Finally, to facilitate the detection of pornographic text, we develop a series of text classifiers using a pseudo-labeled dataset. Detailed data analysis demonstrates that leveraging knowledge distillation techniques with LLMs provides a practical and cost-efficient method for developing pornographic text detectors.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
- L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, 2022.
- P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in neural information processing systems, vol. 30, 2017.
- Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan et al., “Training a helpful and harmless assistant with reinforcement learning from human feedback,” arXiv preprint arXiv:2204.05862, 2022.
- S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg et al., “Sparks of artificial general intelligence: Early experiments with gpt-4,” arXiv preprint arXiv:2303.12712, 2023.
- H. Qiu, H. He, S. Zhang, A. Li, and Z. Lan, “Smile: Single-turn to multi-turn inclusive language expansion via chatgpt for mental health support,” arXiv preprint arXiv:2305.00450, 2023.
- H. Lu, Z. Guo, C. Li, Y. Yang, H. He, and S. Bao, “Towards building an open-domain dialogue system incorporated with internet memes,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.
- K. Zhou, L. Zhuo, Z. Geng, J. Zhang, and X. G. Li, “Convolutional neural networks based pornographic image classification,” in 2016 IEEE Second International Conference on Multimedia Big Data (BigMM). IEEE, 2016, pp. 206–209.
- L. Zhuo, Z. Geng, J. Zhang, and X. guang Li, “Orb feature based web pornographic image recognition,” Neurocomputing, vol. 173, pp. 511–517, 2016.
- A. Tabone, K. Camilleri, A. Bonnici, S. Cristina, R. Farrugia, and M. Borg, “Pornographic content classification using deep-learning,” in Proceedings of the 21st ACM Symposium on Document Engineering, 2021, pp. 1–10.
- S. Samal, R. Nayak, S. Jena, and B. K. Balabantaray, “Obscene image detection using transfer learning and feature fusion,” Multimedia Tools and Applications, pp. 1–29, 2023.
- C. Jansohn, A. Ulges, and T. M. Breuel, “Detecting pornographic video content by combining image features with motion information,” in Proceedings of the 17th ACM international conference on Multimedia, 2009, pp. 601–604.
- M. Perez, S. Avila, D. Moreira, D. Moraes, V. Testoni, E. Valle, S. Goldenstein, and A. Rocha, “Video pornography detection through deep learning techniques and motion information,” Neurocomputing, vol. 230, pp. 279–293, 2017.
- S. Samal, Y.-D. Zhang, T. R. Gadekallu, R. Nayak, and B. K. Balabantaray, “Sbmyv3: Improved mobyolov3 a bam attention-based approach for obscene image and video detection,” Expert Systems, p. e13230, 2023.
- K. Song, Y. Kang, W. Gao, Z. Gao, C. Sun, and X. Liu, “Evidence aware neural pornographic text identification for child protection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 17, 2021, pp. 14 939–14 947.
- A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang et al., “Self-refine: Iterative refinement with self-feedback,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- W. Saunders, C. Yeh, J. Wu, S. Bills, L. Ouyang, J. Ward, and J. Leike, “Self-critiquing models for assisting human evaluators,” arXiv preprint arXiv:2206.05802, 2022.
- B. Zhao, W. Jin, J. Del Ser, and G. Yang, “Chatagri: Exploring potentials of chatgpt on cross-linguistic agricultural text classification,” Neurocomputing, vol. 557, p. 126708, 2023.
- L. Loukas, I. Stogiannidis, O. Diamantopoulos, P. Malakasiotis, and S. Vassos, “Making llms worth every penny: Resource-limited text classification in banking,” in Proceedings of the Fourth ACM International Conference on AI in Finance, 2023, pp. 392–400.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A tutorial on the cross-entropy method,” Annals of operations research, vol. 134, pp. 19–67, 2005.
- Huachuan Qiu (12 papers)
- Shuai Zhang (319 papers)
- Hongliang He (20 papers)
- Anqi Li (70 papers)
- Zhenzhong Lan (56 papers)