Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text (2401.09407v3)

Published 17 Jan 2024 in cs.CL and cs.LG
Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

Abstract: With the recent proliferation of LLMs, there has been an increasing demand for tools to detect machine-generated text. The effective detection of machine-generated text face two pertinent problems: First, they are severely limited in generalizing against real-world scenarios, where machine-generated text is produced by a variety of generators, including but not limited to GPT-4 and Dolly, and spans diverse domains, ranging from academic manuscripts to social media posts. Second, existing detection methodologies treat texts produced by LLMs through a restrictive binary classification lens, neglecting the nuanced diversity of artifacts generated by different LLMs. In this work, we undertake a systematic study on the detection of machine-generated text in real-world scenarios. We first study the effectiveness of state-of-the-art approaches and find that they are severely limited against text produced by diverse generators and domains in the real world. Furthermore, t-SNE visualizations of the embeddings from a pretrained LLM's encoder show that they cannot reliably distinguish between human and machine-generated text. Based on our findings, we introduce a novel system, T5LLMCipher, for detecting machine-generated text using a pretrained T5 encoder combined with LLM embedding sub-clustering to address the text produced by diverse generators and domains in the real world. We evaluate our approach across 9 machine-generated text systems and 9 domains and find that our approach provides state-of-the-art generalization ability, with an average increase in F1 score on machine-generated text of 19.6\% on unseen generators and domains compared to the top performing existing approaches and correctly attributes the generator of text with an accuracy of 93.6\%.

Overview of the Paper

The expansion of LLMs like GPT-3 and its ilk has revolutionized language processing, producing text that's often indistinguishable from human writing. This advancement creates a pressing need for systems that can identify whether the text was written by a human or generated by a machine. Existing detection methods, however, struggle with the diversity of text generators and domains encountered in real-world contexts. This paper presents a critical analysis of these limitations and introduces T5LLMCipher, a new system designed to improve the detection of machine-generated text. It combines a pretrained T5 encoder with a novel approach that uses embeddings sub-clustering. The system demonstrated superior capabilities, outperforming state-of-the-art methods when tested across various LLMs and content domains.

State-of-the-Art Limitations & Proposed Approach

State-of-the-art methods for detecting machine-generated text often fall short in real-world applications. They are generally limited by two significant issues - firstly, their inability to generalize across the wide array of generators and domains, and secondly, their oversimplification of the problem to a binary classification task, ignoring nuanced differences between generators. To address these issues, the authors propose T5LLMCipher. This system applies the embeddings from a pretrained T5 encoder to create a detection mechanism that can accurately identify and attribute machine-generated text to its respective generators, thereby recognizing specific 'fingerprints' unique to different text-producing LLMs.

Insights from Embedding Analysis

The core of the system is informed by the analysis of embeddings—high-dimensional representations of text content generated from an existing LLM encoder. These embeddings can capture the linguistic nuances and distinct features that differentiate human from machine-generated text. Through a technique known as t-SNE visualization, a sort of text mapping, the authors found that machine-generated text does bear identifiable characteristics that can be quantitatively discerned. This discovery was key in designing a system that can not only detect but also attribute the text to particular generators effectively.

Validation and Results

Comprehensive testing was conducted to validate the new system. T5LLMCipher was tasked with identifying machine-generated text within nine different text domains against nine machine text generators. The evaluation revealed that T5LLMCipher improved detection by an average of 19.6% compared to existing approaches and achieved an impressive 93.6% accuracy in attributing the generator of text. Furthermore, the system demonstrated resilience against adversarial attacks aimed at bypassing detection mechanisms, a scenario increasingly relevant as machine-generated text becomes more prevalent and sophisticated.

In summary, the research confirms that while the current state-of-the-art detectors are limited in their practical application, the innovative use of LLM encoder embeddings presents a promising avenue for accurately detecting and classifying machine-generated text in a variety of real-world scenarios. The T5LLMCipher stands as a substantial advancement, bringing us closer to effectively discerning the authenticity of digital content in an era distinguished by machine learning's growing influence on text creation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. AI-Writer. http://ai-writer.com/, 2023.
  2. ArticleForge. https://www.articleforge.com/, 2023.
  3. Cohere. https://cohere.com/, 2023.
  4. dolly-v2-12b. https://huggingface.co/databricks/dolly-v2-12b, 2023.
  5. google/flan-t5-xl. https://huggingface.co/google/flan-t5-xl, 2023.
  6. GPT-3.5. https://platform.openai.com/docs/models/gpt-3-5, 2023.
  7. Hugging Face text2text-generation. https://huggingface.co/models?pipeline_tag=text2text-generation, 2023.
  8. Introducing ChatGPT. https://openai.com/blog/chatgpt, 2023.
  9. Kafkai. https://kafkai.com/en/, 2023.
  10. roberta-base. https://huggingface.co/roberta-base, 2023.
  11. Twitter. https://twitter.com/, 2023.
  12. Cifar-10: Knn-based ensemble of classifiers. In 2016 International Conference on Computational Science and Computational Intelligence (CSCI), pages 1192–1195. IEEE, 2016.
  13. Demystifying neural fake news via linguistic feature-based interpretation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6586–6599, 2022.
  14. The looming threat of fake and llm-generated linkedin profiles: Challenges and opportunities for detection and prevention. In Proceedings of the 34th ACM Conference on Hypertext and Social Media, pages 1–10, 2023.
  15. Residual energy-based models for text. The Journal of Machine Learning Research, 22(1):1840–1880, 2021.
  16. A systematic review on overfitting control in shallow and deep neural networks. Artificial Intelligence Review, pages 1–48, 2021.
  17. How effectively can machines defend against machine-generated fake news? an empirical study. In Proceedings of the First Workshop on Insights from Negative Results in NLP, pages 48–53, 2020.
  18. Social network behavior and public opinion manipulation. Journal of Information Security and Applications, 64:103060, 2022.
  19. Gpt-sentinel: Distinguishing human and chatgpt generated content. arXiv preprint arXiv:2305.07969, 2023.
  20. Machine-generated text: A comprehensive survey of threat models and detection methods. IEEE Access, 2023.
  21. Tweepfake: About detecting deepfake tweets. Plos one, 16(5):e0251415, 2021.
  22. Feature-based detection of automated language models: tackling gpt-2, gpt-3 and grover. PeerJ Computer Science, 7:e443, 2021.
  23. Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW), pages 50–56. IEEE, 2018.
  24. Gltr: Statistical detection and visualization of generated text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 111–116, 2019.
  25. Mgtbench: Benchmarking machine-generated text detection. arXiv preprint arXiv:2303.14822, 2023.
  26. SK Hong and Tae Young Jang. Lea: meta knowledge-driven self-attentive document embedding for few-shot text classification. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 99–106, 2022.
  27. Embedtextnet: Dimension reduction with weighted reconstruction and correlation losses for efficient text embedding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9863–9879, 2023.
  28. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  29. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, 2018.
  30. Artificial text detection via examining the topology of attention maps. arXiv preprint arXiv:2109.04825, 2021.
  31. Deep learning. nature, 521(7553):436–444, 2015.
  32. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  33. Adversarial prompting for black box foundation models. arXiv preprint arXiv:2302.04237, 2023.
  34. Detectgpt: Zero-shot machine-generated text detection using probability curvature. In International Conference on Machine Learning (ICML), 2023.
  35. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022.
  36. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005, 2022.
  37. Identifying computer-generated text using statistical analysis. In 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 1504–1511. IEEE, 2017.
  38. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  39. Lambretta: learning to rank for twitter soft moderation. In 2023 IEEE Symposium on Security and Privacy (SP), pages 311–326. IEEE, 2023.
  40. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  41. Deepfake text detection: Limitations and opportunities. In 2023 IEEE Symposium on Security and Privacy (SP), pages 1613–1630. IEEE, 2023.
  42. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  43. Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf, 2019.
  44. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  45. Impact of smote on imbalanced text features for toxic comments classification using rvvc model. IEEE Access, 9:78621–78634, 2021.
  46. Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
  47. Do massively pretrained language models make better storytellers? In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 843–861, 2019.
  48. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725, 2016.
  49. Distilling reasoning capabilities into smaller language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073, 2023.
  50. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203, 2019.
  51. Domain generalization for text classification with memory-based supervised contrastive learning. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6916–6926, 2022.
  52. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  53. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  54. M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection. arXiv preprint arXiv:2305.14902, 2023.
  55. Analogical-a novel benchmark for long text analogy evaluation in large language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3534–3549, 2023.
  56. Defending against neural fake news. Advances in neural information processing systems, 32, 2019.
  57. Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv preprint arXiv:2306.04528, 2023.
  58. Unsupervised energy-based adversarial domain adaptation for cross-domain text classification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1208–1218, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Mazal Bethany (8 papers)
  2. Brandon Wherry (3 papers)
  3. Emet Bethany (4 papers)
  4. Nishant Vishwamitra (13 papers)
  5. Peyman Najafirad (33 papers)
  6. Anthony Rios (25 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets