Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distinguishing Chatbot from Human (2408.04647v1)

Published 3 Aug 2024 in cs.CL and cs.LG

Abstract: There have been many recent advances in the fields of generative AI and LLMs (LLM), with the Generative Pre-trained Transformer (GPT) model being a leading "chatbot." LLM-based chatbots have become so powerful that it may seem difficult to differentiate between human-written and machine-generated text. To analyze this problem, we have developed a new dataset consisting of more than 750,000 human-written paragraphs, with a corresponding chatbot-generated paragraph for each. Based on this dataset, we apply Machine Learning (ML) techniques to determine the origin of text (human or chatbot). Specifically, we consider two methodologies for tackling this issue: feature analysis and embeddings. Our feature analysis approach involves extracting a collection of features from the text for classification. We also explore the use of contextual embeddings and transformer-based architectures to train classification models. Our proposed solutions offer high classification accuracy and serve as useful tools for textual analysis, resulting in a better understanding of chatbot-generated text in this era of advanced AI technology.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Gauri Anil Godghase (1 paper)
  2. Rishit Agrawal (2 papers)
  3. Tanush Obili (2 papers)
  4. Mark Stamp (56 papers)