From Text to Source: Results in Detecting Large Language Model-Generated Content (2309.13322v2)
Abstract: The widespread use of LLMs, celebrated for their ability to generate human-like text, has raised concerns about misinformation and ethical implications. Addressing these concerns necessitates the development of robust methods to detect and attribute text generated by LLMs. This paper investigates "Cross-Model Detection," by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The study comprehensively explores various LLM sizes and families, and assesses the impact of conversational fine-tuning techniques, quantization, and watermarking on classifier generalization. The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection. Our results reveal several key findings: a clear inverse relationship between classifier effectiveness and model size, with larger LLMs being more challenging to detect, especially when the classifier is trained on data from smaller models. Training on data from similarly sized LLMs can improve detection performance from larger models but may lead to decreased performance when dealing with smaller models. Additionally, model attribution experiments show promising results in identifying source models and model families, highlighting detectable signatures in LLM-generated text, with particularly remarkable outcomes in watermarking detection, while no detectable signatures of quantization were observed. Overall, our study contributes valuable insights into the interplay of model size, family, and training data in LLM detection and attribution.
- The falcon series of language models: Towards open frontier models.
- AraGPT2: Pre-trained transformer for Arabic language generation. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 196–207, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
- Towards a Robust Detection of Language Model-Generated Text: Is ChatGPT that easy to detect? In 18e Conférence en Recherche d’Information et Applications – 16e Rencontres Jeunes Chercheurs en RI – 30e Conférence sur le Traitement Automatique des Langues Naturelles – 25e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, pages 14–27, Paris, France. ATALA.
- Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25–27, 2001 Proceedings 4, pages 185–200. Springer.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery.
- Pythia: A suite for analyzing large language models across training and scaling.
- Abir Chakraborty. 2023. RGAT at SemEval-2023 task 2: Named entity recognition using graph attention network. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 163–170, Toronto, Canada. Association for Computational Linguistics.
- On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736.
- The dangers of trusting stochastic parrots: Faithfulness and trust in open-domain conversational question answering. In Findings of the Association for Computational Linguistics: ACL 2023, pages 947–959, Toronto, Canada. Association for Computational Linguistics.
- Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194.
- Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Machine-generated text: A comprehensive survey of threat models and detection methods. IEEE Access.
- Cerebras-gpt: Open compute-optimal language models trained on the cerebras wafer-scale cluster.
- Is GPT-3 text indistinguishable from human text? scarecrow: A framework for scrutinizing machine text. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7250–7274, Dublin, Ireland. Association for Computational Linguistics.
- Tweepfake: About detecting deepfake tweets. Plos one, 16(5):e0251415.
- Three bricks to consolidate watermarks for large language models.
- Gptq: Accurate post-training quantization for generative pre-trained transformers.
- The pile: An 800gb dataset of diverse text for language modeling.
- GLTR: Statistical detection and visualization of generated text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 111–116, Florence, Italy. Association for Computational Linguistics.
- Xinyang Geng and Hao Liu. 2023. Openllama: An open reproduction of llama.
- Openwebtext corpus. http://Skylion007.github.io/OpenWebTextCorpus.
- How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597.
- DeBERTav3: Improving deBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. In The Eleventh International Conference on Learning Representations.
- {DEBERTA}: {DECODING}-{enhanced} {bert} {with} {disentangled} {attention}. In International Conference on Learning Representations.
- Automatic detection of machine generated text: A critical survey. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2296–2309, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- A watermark for large language models. arXiv preprint arXiv:2301.10226.
- Kris McGuffie and Alex Newhouse. 2020. The radicalization risks of gpt-3 and advanced neural language models. arXiv preprint arXiv:2009.06807.
- Smaller language models are better black-box machine-generated text detectors.
- Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3470–3487, Dublin, Ireland. Association for Computational Linguistics.
- Detectgpt: Zero-shot machine-generated text detection using probability curvature.
- Chatgpt or human? detect and explain. explaining decisions of machine learning model for detecting short chatgpt-generated text. arXiv preprint arXiv:2301.13852.
- NLP Team MosaicML. 2023. Introducing mpt-30b: Raising the bar for open-source foundation models. Accessed: 2023-06-22.
- The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
- On the zero-shot generalization of machine-generated text detectors. arXiv preprint arXiv:2310.05165.
- Language models are unsupervised multitask learners.
- Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156.
- Supervised machine-generated text detectors: Family and scale matters. In Experimental IR Meets Multilinguality, Multimodality, and Interaction: 14th International Conference of the CLEF Association, CLEF 2023, Thessaloniki, Greece, September 18–21, 2023, Proceedings, page 121–132, Berlin, Heidelberg. Springer-Verlag.
- Overview of autextification at iberlef 2023: Detection and attribution of machine-generated text in multiple domains.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models.
- Authorship attribution for neural text generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8384–8395, Online. Association for Computational Linguistics.
- Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
- Defending against neural fake news. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 9054–9065. Curran Associates, Inc.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- A survey of large language models.
- Judging llm-as-a-judge with mt-bench and chatbot arena.
- Fine-tuning language models from human preferences.
- Wissam Antoun (11 papers)
- Benoît Sagot (60 papers)
- Djamé Seddah (28 papers)