Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ChatGPT v Bard v Bing v Claude 2 v Aria v human-expert. How good are AI chatbots at scientific writing? (2309.08636v3)

Published 14 Sep 2023 in cs.CL, cs.AI, cs.CY, cs.ET, and cs.HC

Abstract: Historical emphasis on writing mastery has shifted with advances in generative AI, especially in scientific writing. This study analysed six AI chatbots for scholarly writing in humanities and archaeology. Using methods that assessed factual correctness and scientific contribution, ChatGPT-4 showed the highest quantitative accuracy, closely followed by ChatGPT-3.5, Bing, and Bard. However, Claude 2 and Aria scored considerably lower. Qualitatively, all AIs exhibited proficiency in merging existing knowledge, but none produced original scientific content. Inter-estingly, our findings suggest ChatGPT-4 might represent a plateau in LLM size. This research emphasizes the unique, intricate nature of human research, suggesting that AI's emulation of human originality in scientific writing is challenging. As of 2023, while AI has transformed content generation, it struggles with original contributions in humanities. This may change as AI chatbots continue to evolve into LLM-powered software.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Edisa Lozić (1 paper)
  2. Benjamin Štular (1 paper)
Citations (25)