Generative AI in Academic Writing: A Comparison of DeepSeek, Qwen, ChatGPT, Gemini, Llama, Mistral, and Gemma (2503.04765v2)
Abstract: DeepSeek v3, developed in China, was released in December 2024, followed by Alibaba's Qwen 2.5 Max in January 2025 and Qwen3 235B in April 2025. These free and open-source models offer significant potential for academic writing and content creation. This study evaluates their academic writing performance by comparing them with ChatGPT, Gemini, Llama, Mistral, and Gemma. There is a critical gap in the literature concerning how extensively these tools can be utilized and their potential to generate original content in terms of quality, readability, and effectiveness. Using 40 papers on Digital Twin and Healthcare, texts were generated through AI tools based on posed questions and paraphrased abstracts. The generated content was analyzed using plagiarism detection, AI detection, word count comparisons, semantic similarity, and readability assessments. Results indicate that paraphrased abstracts showed higher plagiarism rates, while question-based responses also exceeded acceptable levels. AI detection tools consistently identified all outputs as AI-generated. Word count analysis revealed that all chatbots produced a sufficient volume of content. Semantic similarity tests showed a strong overlap between generated and original texts. However, readability assessments indicated that the texts were insufficient in terms of clarity and accessibility. This study comparatively highlights the potential and limitations of popular and latest LLMs for academic writing. While these models generate substantial and semantically accurate content, concerns regarding plagiarism, AI detection, and readability must be addressed for their effective use in scholarly work.
- Omer Aydin (13 papers)
- Enis Karaarslan (12 papers)
- Fatih Safa Erenay (1 paper)
- Nebojsa Bacanin (11 papers)