Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level (2306.08122v1)

Published 13 Jun 2023 in cs.CL, cs.AI, and cs.LG

Abstract: The increasing reliance on LLMs in academic writing has led to a rise in plagiarism. Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using NLP techniques, offering quantifiable metrics at both sentence and document levels for easier interpretation by human evaluators. Our method employs a multi-faceted approach, generating multiple paraphrased versions of a given question and inputting them into the LLM to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response. Our approach achieves up to 94% accuracy in classifying human and AI text, providing a robust and adaptable solution for plagiarism detection in academic settings. This method improves with LLM advancements, reducing the need for new model training or reconfiguration, and offers a more transparent way of evaluating and detecting AI-generated text.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Aaditya Bhat. 2023. Gpt-wiki-intro (revision 0e458f5).
  2. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
  3. Character.AI. 2023. Character.ai. https://beta.character.ai/. Accessed on May 15, 2023.
  4. ChatGPT. 2023. Chatgpt official website. https://openai.com/blog/chatgpt. Accessed on May 15, 2023.
  5. Yen-Chi Chen. 2017. A tutorial on kernel density estimation and recent advances.
  6. Fabio Duarte. 2023. Number of chatgpt users (2023). https://explodingtopics.com/blog/chatgpt-users. Accessed on May 15, 2023.
  7. Geoffrey A. Fowler. 2023. We tested a new chatgpt-detector for teachers. it flagged an innocent student. https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin/.
  8. Google. 2023. Bardai. https://blog.google/technology/ai/try-bard/. Accessed on May 15, 2023.
  9. GPTZero. 2023. Gptzero official website. https://gptzero.me/. Accessed on May 15, 2023.
  10. How close is chatgpt to human experts? comparison corpus, evaluation, and detection.
  11. Mgtbench: Benchmarking machine-generated text detection.
  12. Ai, write an essay for me: A large-scale comparison of human-written versus chatgpt-generated essays.
  13. Chatgpt – a blessing or a curse for undergraduate computer science students and instructors?
  14. Mohammad Khalil and Erkan Er. 2023. Will chatgpt get you caught? rethinking of plagiarism detection.
  15. Detectgpt: Zero-shot machine-generated text detection using probability curvature.
  16. OpenAI. 2023a. Gpt-4 technical report.
  17. OpenAI. 2023b. Openai official website. https://openai.com/. Accessed on May 15, 2023.
  18. Adam Roberts and Colin Raffel. 2020. Exploring transfer learning with T5: the text-to-text transfer transformer. Google AI Blog. Google AI Blog.
  19. Release strategies and the social impacts of language models.
  20. The science of detecting llm-generated texts.
  21. Alaa Tharwat et al. 2017. Linear discriminant analysis: A detailed tutorial.
  22. Milvus: A purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data, pages 2614–2627.
  23. Harnessing the power of llms in practice: A survey on chatgpt and beyond.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mujahid Ali Quidwai (1 paper)
  2. Chunhui Li (24 papers)
  3. Parijat Dube (19 papers)
Citations (13)
Youtube Logo Streamline Icon: https://streamlinehq.com