Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 88 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 17 tok/s Pro

GPT-5 High 17 tok/s Pro

GPT-4o 73 tok/s Pro

GPT OSS 120B 464 tok/s Pro

Kimi K2 190 tok/s Pro

2000 character limit reached

"I know myself better, but not really greatly": How Well Can LLMs Detect and Explain LLM-Generated Texts? (2502.12743v2)

Published 18 Feb 2025 in cs.CL and cs.AI

Abstract: Distinguishing between human- and LLM-generated texts is crucial given the risks associated with misuse of LLMs. This paper investigates detection and explanation capabilities of current LLMs across two settings: binary (human vs. LLM-generated) and ternary classification (including an ``undecided'' class). We evaluate 6 close- and open-source LLMs of varying sizes and find that self-detection (LLMs identifying their own outputs) consistently outperforms cross-detection (identifying outputs from other LLMs), though both remain suboptimal. Introducing a ternary classification framework improves both detection accuracy and explanation quality across all models. Through comprehensive quantitative and qualitative analyses using our human-annotated dataset, we identify key explanation failures, primarily reliance on inaccurate features, hallucinations, and flawed reasoning. Our findings underscore the limitations of current LLMs in self-detection and self-explanation, highlighting the need for further research to address overfitting and enhance generalizability.

Collections

Summary

The paper shows that LLMs enhance detection accuracy by 5.6% when using a ternary classification approach.
The paper reveals that explanation quality is compromised by inaccurate features, hallucinations, and flawed reasoning.
The paper emphasizes the need for improved LLM architectures and fine-tuning to achieve more reliable self-detection and explanation.

How Well Can LLMs Detect and Explain Generated Texts?

Introduction

The research paper explores the ability of LLMs to detect and explain their own generated texts across two classification tasks: binary and ternary. It provides comprehensive qualitative and quantitative analyses to reveal the challenges in distinguishing between human-generated texts (HGTs) and LLM-generated texts (LGTs). The experiments were conducted on six open-source and proprietary LLMs of varying sizes, highlighting a consistent self-detection superiority over cross-detection across all models evaluated.

Binary vs. Ternary Classification

The paper delineates two distinct tasks for detection: binary classification, where the model was tasked to categorize texts as either HGTs or LGTs, and ternary classification, adding an "Undecided" category. The introduction of the "Undecided" category in ternary classification showed a statistically significant improvement in both the detection accuracy and explanation quality of LGTs. The experimental results indicate that the ternary classification task enhances the models' ability to distinguish between the nuanced cases of LGTs and HGTs, increasing detection performance by an average of 5.6%.

Challenges in Explanation

Despite noticeable performance improvements in classification tasks, the paper identifies multiple challenges in generating accurate explanations for detected texts. LLMs often failed to provide reliable explanations due to reliance on inaccurate features, hallucinations, and incorrect reasoning—issues that were found to be prevalent in self-detection scenarios.

Inaccurate Features: LLMs frequently misattributed features as being indicative of machine authorship, leading to incorrect classifications. This was particularly evident when models incorrectly labeled complex emotional or logical constructs as inherently human, disregarding the advanced capabilities of present-day LLMs to mimic such depth.

Hallucinations: The models sometimes identified non-existent characteristics or misrepresented text features, further complicating the reliability of their explanations.

Incorrect Reasoning: Even when text features were correctly identified, LLMs showed flawed reasoning processes, leading to incorrect judgments about text origin.

Model Performance and Human-Annotated Data

The research featured a benchmark dataset incorporating human annotations to evaluate explanation accuracy, where students manually assessed correctness. Models like GPT-4o demonstrated superior performance across classification tasks, yet frequently struggled with hallucinations in explanations. The results emphasize the critical need for development towards enhancing interpretability and reasoning transparency in LLM detectors.

Implications and Future Work

The results underscore the necessity for ternary classification rather than binary to effectively manage ambiguous text cases. The identification of fundamental limitations in LLM explainability indicates a research path focusing on enhancing explanation reliability — involving not only improvements in LLM architecture but also in fine-tuning techniques. The paper also hints at potential advancements in collaborative LLM systems, where multiple models work in concert to pool reasoning capabilities, reducing explanation errors across diverse datasets.

Conclusion

Improvements in LLM-based detectors, particularly when integrating ternary classification and addressing explanation reliability, demonstrate promise for practical applications in automated content moderation, academic integrity, and misinformation detection. Future strategies should prioritize the refinement of explainability features and cross-detection capabilities, ensuring models remain trustworthy and interpretable to their users.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (9)

Tweets

https://twitter.com/ai_database/status/1893806245259436266

https://twitter.com/knishimae0531/status/1893825731563516220