Papers
Topics
Authors
Recent
Search
2000 character limit reached

Machine-Generated Text Detection using Deep Learning

Published 26 Nov 2023 in cs.CL | (2311.15425v1)

Abstract: Our research focuses on the crucial challenge of discerning text produced by LLMs from human-generated text, which holds significance for various applications. With ongoing discussions about attaining a model with such functionality, we present supporting evidence regarding the feasibility of such models. We evaluated our models on multiple datasets, including Twitter Sentiment, Football Commentary, Project Gutenberg, PubMedQA, and SQuAD, confirming the efficacy of the enhanced detection approaches. These datasets were sampled with intricate constraints encompassing every possibility, laying the foundation for future research. We evaluate GPT-3.5-Turbo against various detectors such as SVM, RoBERTa-base, and RoBERTa-large. Based on the research findings, the results predominantly relied on the sequence length of the sentence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  2. On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736.
  3. Machine-generated text: A comprehensive survey of threat models and detection methods. IEEE Access.
  4. Bijoyan Das and Sarit Chakraborty. 2018. An improved text sentiment classification model using tf-idf and next word negation. arXiv preprint arXiv:1806.06407.
  5. GPTZero. 2023. GPTZero Website.
  6. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597.
  7. A watermark for large language models. arXiv preprint arXiv:2301.10226.
  8. Stylometric detection of ai-generated text in twitter timelines. arXiv preprint arXiv:2303.03697.
  9. Detectgpt: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305.
  10. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  11. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  12. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203.
Citations (2)

Summary

  • The paper demonstrates that deep learning models, especially RoBERTa-based systems, significantly outperform SVM in detecting machine-generated text.
  • It employs a comprehensive dataset segmented by sentence length to capture subtle linguistic patterns and text complexities.
  • The findings validate the capability to distinguish human-written from AI-generated text, enhancing the integrity of digital communication.

Introduction

The proliferation of LLMs such as GPT-3.5 Turbo has revolutionized various industries by automating content creation. However, this also raises concerns about distinguishing between human and machine-generated text. The integrity of digital communication relies on our ability to make such distinctions.

Previous efforts in machine-generated text detection have centered on identifying general characteristics of AI-generated content. Some researchers have proposed watermarking techniques to embed detectable signals in LLM output. Others focused on stylometric detection to differentiate AI-produced tweets by analyzing linguistic features. Additionally, tools like GPTZero utilize metrics such as perplexity and burstiness to identify machine-generated text.

Proposed Approach

The study proposes a comprehensive approach to discriminate between human and LLM-generated sentences. The researchers build a dataset with a broad spectrum of sentences from different domains. By subjecting this dataset to deep learning models like SVM, RoBERTa-Base, and RoBERTa-Large, the team seeks to identify subtleties in language patterns exclusive to AI-generated content.

Model Description

The SVM model uses a radial basis function kernel with feature representation through TF-IDF to classify the text. While it provides a solid baseline, it is surpassed by RoBERTa-based architectures for more complex tasks. RoBERTa-Base and RoBERTa-Large models, empowered with additional layers, leverage their deep understanding of language context to outperform SVM, especially as sentence length increases.

Experimental Setup

The study categorizes the dataset across different sentence length ranges to analyze model performance in fine detail. It records the Area Under the Receiver Operating Characteristic curve for each model, providing insights into each model's capability to process varying textual complexity.

Results and Discussion

The experiments demonstrate that RoBERTa models are particularly efficient, with RoBERTa-Large showing dominance in handling complex and longer sentences. SVM performs well but lacks the sophistication of RoBERTa models. The findings underscore the effectiveness of the proposed models in addressing the challenge of determining text origins.

Conclusion and Future Work

The study validates the possibility of distinguishing between human and ChatGPT-generated text. For future advancements, incorporating a broader dataset and more LLMs may enrich detection capabilities. Exploring other methodologies, such as zero-shot or one-shot learning systems, could also yield more resource-efficient classifiers.

In summary, this investigation into machine-generated text detection advances our understanding of how deep learning can be leveraged to uphold the authenticity of digital communication in the face of increasingly sophisticated AI LLMs.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.