AdaDetectGPT: Adaptive LLM Text Detector
- AdaDetectGPT is an adaptive classifier that learns a witness function to transform token log-probabilities, significantly improving separation between human and LLM-generated text.
- It optimizes the witness function via a closed-form linear system using B-spline features, ensuring computational efficiency and finite-sample statistical guarantees.
- Experimental evaluations across diverse datasets show AdaDetectGPT beats baseline detectors by up to 58% AUC, validating its robust performance in real-world scenarios.
AdaDetectGPT is an adaptive classifier for distinguishing LLM–generated text from human-authored text. It advances prior logits-based detectors by learning a @@@@1@@@@ that optimally transforms log probability statistics, yielding robust performance improvements and finite-sample statistical guarantees. AdaDetectGPT integrates domain knowledge, rigorous optimization, and accessible implementation to address the evolving challenge of machine-generated text detection across diverse datasets and LLMs.
1. Adaptive Witness Function and Detection Statistic
AdaDetectGPT builds on the foundational approach of probability-based detectors, which estimate the likelihood that a given text sequence was produced by an LLM versus a human author. Traditional approaches such as DetectGPT or Fast-DetectGPT rely on summary statistics of token log-probabilities , where is the conditional distribution of the source LLM. AdaDetectGPT generalizes this by introducing a witness function that adaptively transforms these logits.
The AdaDetectGPT detection statistic is
where is learned from data. This statistic adapts its transformation of the log-probabilities to maximize the separation between human and machine-generated text, supplanting the fixed log-probability or curvature metrics used by earlier detectors.
2. Witness Function Learning and Optimization
The core methodological advance is optimizing the witness function over a function class , where represents a vector of basis functions (implemented as B-splines) and is a parameter vector. The optimization seeks that maximizes a lower bound on classification accuracy:
- The objective is to maximize the true negative rate (TNR) on human-authored texts, subject to controlling false negative rate (FNR) on LLM-generated texts under a target level .
- In practice, expectations are estimated via empirical averages over sampled data, and the optimal is obtained by solving a linear system, which is computationally efficient.
The learned produces a more informative statistic that amplifies subtle distributional differences between human and LLM-generated text, resulting in improvements over methods that use only raw log-probabilities or heuristic transforms.
3. Theoretical Guarantees for Classification Metrics
AdaDetectGPT provides rigorous statistical guarantees on key detection metrics, conditional on realistic assumptions (such as passage length large, variance comparability, and certain stochastic dominance):
- Under these conditions, the distribution of for machine-generated text converges to standard normal as via martingale central limit theorem (MCLT).
- The classification threshold can be set at the -quantile of the normal distribution, ensuring
- A lower bound for the true negative rate (TNR) on human-written text is derived (assuming regularity conditions):
where quantifies the population-level discrepancy between transformed logit expectations.
- As and the adaptation sample size grow, AdaDetectGPT's FNR approaches and its TNR approaches the oracle rate, with explicit finite-sample bounds.
The framework also provides analogous convergence results and bounds for TPR (true positive rate) and FPR (false positive rate).
4. Empirical Performance Across Settings
Extensive experiments compare AdaDetectGPT against a spectrum of baselines (Likelihood, Entropy, LogRank, DetectGPT, Fast-DetectGPT, DNA-GPT, Binoculars, RADAR, etc.) across multiple datasets (SQuAD, WritingPrompts, XSum, Yelp, Essay) and LLM configurations.
Setting | AdaDetectGPT Improvement (AUC) |
---|---|
White-box | 12.5%–37% over baseline |
Black-box | Up to 58% |
- In the white-box setting (scoring and generation LLM are identical), AdaDetectGPT uniformly boosts AUC performance.
- In black-box scenarios (scoring and generation LLM differ), gains up to 58% are observed.
- These improvements are consistent across text genres—news, academic writing, creative stories, product reviews.
- Detailed tabular comparisons show AdaDetectGPT outperforming the strongest existing detectors. All metrics and thresholds are calculated directly from empirical distributions using the adapted statistic.
5. Implementation Efficiency and Reproducibility
AdaDetectGPT is computationally efficient:
- The witness function is learned via a closed-form linear system in one dimension, using B-spline basis expansion.
- Training time is typically less than one minute and memory footprint is minimal.
- The detector does not require deep network fine-tuning, large auxiliary classifiers, or expensive sample queries.
- Full Python implementation and usage instructions are available at https://github.com/Mamba413/AdaDetectGPT.
This practical design ensures accessibility for real-world deployment and rapid experimentation with new datasets and LLMs.
6. Applications and Future Directions
AdaDetectGPT is suitable for applications requiring robust discrimination of LLM-generated text, including:
- Fake news screening
- Academic integrity enforcement
- Social media content verification
- Automated content moderation
Future work aims to:
- Extend statistical guarantees to black-box scenarios involving differing source and target LLMs.
- Address complexities such as varying sampling strategies (temperature, top-), adversarial evasion attempts, and highly edited or paraphrased texts.
- Integrate with watermarking and learned representation-based detectors for hybrid robustness.
- Develop theoretical analysis for adaptation under distribution shift and real-world perturbations.
A plausible implication is that AdaDetectGPT could become a core component of adaptive, statistically grounded content authentication systems for LLM-powered platforms.
7. Summary
AdaDetectGPT introduces an adaptive witness function approach to enhance logits-based detection of machine-generated text. By learning a transformation of token log-probabilities optimized for discrimination, it yields statistically principled and empirically validated improvements over state-of-the-art detectors. The method comes with explicit finite-sample guarantees for all practical performance metrics, minimal computational requirements, and reproducible implementation, positioning it as a central methodology in robust, scalable LLM output detection. Future research is directed toward expanding theoretical coverage, improving resistance to sophisticated evasion, and integrating with complementary detection paradigms.