AI Content Self-Detection for Transformer-based Large Language Models (2312.17289v1)
Abstract: $ $The usage of generative AI tools based on LLMs, including ChatGPT, Bard, and Claude, for text generation has many exciting applications with the potential for phenomenal productivity gains. One issue is authorship attribution when using AI tools. This is especially important in an academic setting where the inappropriate use of generative AI tools may hinder student learning or stifle research by creating a large amount of automatically generated derivative work. Existing plagiarism detection systems can trace the source of submitted text but are not yet equipped with methods to accurately detect AI-generated text. This paper introduces the idea of direct origin detection and evaluates whether generative AI systems can recognize their output and distinguish it from human-written texts. We argue why current transformer-based models may be able to self-detect their own generated text and perform a small empirical study using zero-shot learning to investigate if that is the case. Results reveal varying capabilities of AI systems to identify their generated text. Google's Bard model exhibits the largest capability of self-detection with an accuracy of 94\%, followed by OpenAI's ChatGPT with 83\%. On the other hand, Anthropic's Claude model seems to be not able to self-detect.
- A. Uchendu, T. Le, K. Shu, and D. Lee, “Authorship attribution for neural text generation,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, Nov. 2020, pp. 8384–8395. [Online]. Available: https://aclanthology.org/2020.emnlp-main.673
- Y. Arase, Z. Chen, X. Li, J. Zhang, and T. Zhang, “A style-aware generative model for paraphrasing,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 1060–1075, 2021.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2022.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, Eds., vol. 27. Curran Associates, Inc., 2014. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” CoRR, vol. abs/1706.03762, 2017. [Online]. Available: http://arxiv.org/abs/1706.03762
- A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” OpenAI, Tech. Rep., 2018.
- Irene Solaimanet al, “Release strategies and the social impacts of language models,” arXiv preprint arXiv:1908.09203, 2019.
- D. R. Cotton, P. A. Cotton, and J. R. Shipway, “Chatting and cheating: Ensuring academic integrity in the era of ChatGPT,” Innovations in Education and Teaching International, pp. 1–12, 2023.
- Z. Liu, Z. Yao, F. Li, and B. Luo, “Check me if you can: Detecting ChatGPT-generated academic writing using CheckGPT,” arXiv preprint arXiv:2306.05524, 2023.
- S. Mitrović, D. Andreoletti, and O. Ayoub, “ChatGPT or human? detect and explain. explaining decisions of machine learning model for detecting short chatgpt-generated text,” arXiv preprint arXiv:2301.13852, 2023.
- N. Anderson, D. L. Belavy, S. M. Perle, S. Hendricks, L. Hespanhol, E. Verhagen, and A. R. Memon, “AI did not write this manuscript, or did it? can we trick the AI text detector into generated texts? the potential future of ChatGPT and AI in sports & exercise medicine manuscript generation,” p. e001568, 2023.
- B. Guo, X. Zhang, Z. Wang, M. Jiang, J. Nie, Y. Ding, J. Yue, and Y. Wu, “How close is ChatGPT to human experts? comparison corpus, evaluation, and detection,” arXiv preprint arXiv:2301.07597, 2023.
- P. A. Busch and G. I. Hausvik, “Too good to be true? an empirical study of ChatGPT capabilities for academic writing and implications for academic misconduct,” 2023.
- M. Khalil and E. Er, “Will ChatGPT get you caught? rethinking of plagiarism detection,” arXiv preprint arXiv:2302.04335, 2023.
- P. Yu, J. Chen, X. Feng, and Z. Xia, “CHEAT: A large-scale dataset for detecting ChatGPT-writtEn AbsTracts,” arXiv preprint arXiv:2304.12008, 2023.
- D. Weber-Wulff, A. Anohina-Naumeca, S. Bjelobaba, T. Foltýnek, J. Guerrero-Dib, O. Popoola, P. Šigut, and L. Waddington, “Testing of detection tools for ai-generated text,” 2023.
- R. J. M. Ventayen, “OpenAI ChatGPT generated results: Similarity index of artificial intelligence-based contents,” Available at SSRN 4332664, 2023.
- Romal Thoppilan et al, “LaMDA: Language models for dialog applications,” arXiv preprint arXiv:2201.08239, 2022.
- Rohan Anil et al, “PaLM 2 technical report,” arXiv preprint arXiv:2305.10403, 2023.
- D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Radford, D. Amodei, P. Christiano, and G. Irving, “Fine-tuning language models from human preferences,” 2020.
- Antônio Junior Alves Caiado (1 paper)
- Michael Hahsler (8 papers)