Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Few-Shot Detection of Machine-Generated Text using Style Representations (2401.06712v3)

Published 12 Jan 2024 in cs.CL and cs.LG

Abstract: The advent of instruction-tuned LLMs that convincingly mimic human writing poses a significant risk of abuse. However, such abuse may be counteracted with the ability to detect whether a piece of text was composed by a LLM rather than a human author. Some previous approaches to this problem have relied on supervised methods by training on corpora of confirmed human- and machine- written documents. Unfortunately, model under-specification poses an unavoidable challenge for neural network-based detectors, making them brittle in the face of data shifts, such as the release of newer LLMs producing still more fluent text than the models used to train the detectors. Other approaches require access to the models that may have generated a document in question, which is often impractical. In light of these challenges, we pursue a fundamentally different approach not relying on samples from LLMs of concern at training time. Instead, we propose to leverage representations of writing style estimated from human-authored text. Indeed, we find that features effective at distinguishing among human authors are also effective at distinguishing human from machine authors, including state-of-the-art LLMs like Llama-2, ChatGPT, and GPT-4. Furthermore, given a handful of examples composed by each of several specific LLMs of interest, our approach affords the ability to predict which model generated a given document. The code and data to reproduce our experiments are available at https://github.com/LLNL/LUAR/tree/main/fewshot_iclr2024.

Introduction

In the field of AI, LLMs have become increasingly sophisticated, able to generate text nearly indistinguishable from human writing. While these advancements have many positive applications, they also pose a risk when used maliciously for plagiarism, disinformation, and other deceptive practices. The challenge is detecting whether text has been generated by a machine, particularly as models evolve and new ones are introduced, often surpassing the capabilities of existing detection systems. Traditional detection methods depend heavily on supervised learning with large datasets of machine vs. human text but are often unsuitable for next-generation models not present in the training data.

Style-based Detection Approach

A novel approach is proposed that shifts the focus from content to style. Unlike content that can vary according to topics or prompts, an author's writing style carries idiosyncratic features across their work. This method capitalizes on learned style representations from vast human-authored texts to distinguish between human and machine writing. Initial findings reveal that attributes which pinpoint different human authors can also be leveraged to discern human authorship from machine-generated content, even from advanced LLMs like Llama 2, ChatGPT, and GPT-4. An advantage of this technique is its adaptability—it can be effective with minimal examples from LLMs, hence termed "few-shot detection."

Methodology and Experimentation

The research details several experiments and methodologies. A new yardstick is defining effectiveness by the ability to detect machine-produced content with minimal false-alarms—critical for practical scenarios such as academic plagiarism detection or filtering out AI-generated spam. The paper contrasts its approach with well-known methods like OpenAI's text classifier, highlighting the limitations when facing novel, unseen machine-written content.

For several style representation techniques, the paper shows that they are potent in identifying machine text, even when trained mostly on human writing. These techniques include adapting multi-domain data (incorporating stylistic elements from different platform sources) and training on documents generated by accessible LLMs to improve text detection from more powerful or emerging models. The research also involves creating openly accessible datasets for the scholarly community, promoting further exploration and validation of detection methods.

Evaluating Robustness

Another essential component of the method is its robustness to countermeasures like text paraphrasing designed to thwart detection. Here, they demonstrate how the approach remains effective even against adversarially adapted content. Continuously evolving models necessitate a framework that can handle the ever-changing landscape together with the need to craft strategies that can immediately identify abuse by unknown LLMs.

Conclusion and Impact

The proposed method is innovative in using style as a detection signal, delivering a practical, scalable, and adaptable tool to combat machine-text abuse while maintaining lower false positives. The research emphasizes that as LLMs become more mainstream, strategies to distinguish AI-authorship from human writing will be vital. Recognizing the broader impact, the future work will include extending approaches to languages beyond English, most critical for global languages with rich internet presences.

As AI continues to advance, transparency, accountability, and controls for LLMs are essential, and researchers are committed to contributing tools that empower stakeholders across varied sectors to uphold integrity in information dissemination. The results encourage prompt adoption of this methodology in settings that require an immediate detection line of defense.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. 2023. OpenAI ChatGPT API “gpt-3.5-turbo”. Available at: https://api.openai.com/v1/chat/completions.
  2. Nicholas Andrews and Marcus Bishop. 2019. Learning invariant representations of social media users. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1684–1695.
  3. The Pushshift Reddit Dataset. In Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), volume 14, pages 830–839.
  4. Language models are few-shot learners.
  5. Scaling instruction-finetuned language models.
  6. Free dolly: Introducing the world’s first truly open instruction-tuned llm.
  7. Roft: A tool for evaluating human detection of machine-generated text.
  8. Model-agnostic meta-learning for fast adaptation of deep networks.
  9. Unsupervised and distributional detection of machine-generated text. arXiv preprint arXiv:2111.02878.
  10. Gltr: Statistical detection and visualization of generated text. arXiv preprint arXiv:1906.04043.
  11. Tilmann Gneiting and Adrian E Raftery. 2007. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378.
  12. On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR.
  13. Julian Hazell. 2023. Large language models can be used to effectively scale spear phishing campaigns. arXiv preprint arXiv:2305.06972.
  14. Cater: Intellectual property protection on text generation apis via conditional watermarks.
  15. Automatic detection of generated text is easiest when humans are fooled. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1808–1822.
  16. Automatic detection of machine generated text: A critical survey. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2296–2309, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  17. A watermark for large language models. arXiv preprint arXiv:2301.10226.
  18. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. ArXiv, abs/2303.13408.
  19. Identifying automatically generated headlines using transformers. arXiv preprint arXiv:2009.13375.
  20. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
  21. Detectgpt: Zero-shot machine-generated text detection using probability curvature.
  22. Crosslingual generalization through multitask finetuning.
  23. Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136.
  24. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 188–197.
  25. OpenAI. 2023. Gpt-4 technical report.
  26. Low-resource authorship style transfer: Can non-famous authors be imitated?
  27. John Platt et al. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74.
  28. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  29. A recipe for arbitrary text style transfer with large language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 837–848, Dublin, Ireland. Association for Computational Linguistics.
  30. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  31. Learning universal authorship representations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 913–919, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  32. Can ai-generated text be reliably detected?
  33. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30.
  34. Release strategies and the social impacts of language models (arxiv:1908.09203). https://huggingface.co/roberta-base-openai-detector.
  35. Llama 2: Open foundation and fine-tuned chat models.
  36. M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection.
  37. Same author or just same topic? Towards content-independent style representations. In Proceedings of the 7th Workshop on Representation Learning for NLP, pages 249--268. Association for Computational Linguistics.
  38. Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 214--229.
  39. Defending against neural fake news. Advances in neural information processing systems, 32.
  40. Opt: Open pre-trained transformer language models.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Rafael Rivera Soto (4 papers)
  2. Kailin Koch (1 paper)
  3. Aleem Khan (6 papers)
  4. Barry Chen (5 papers)
  5. Marcus Bishop (7 papers)
  6. Nicholas Andrews (22 papers)
Citations (11)