Applying Ensemble Methods to Model-Agnostic Machine-Generated Text Detection (2406.12570v1)
Abstract: In this paper, we study the problem of detecting machine-generated text when the LLM it is possibly derived from is unknown. We do so by apply ensembling methods to the outputs from DetectGPT classifiers (Mitchell et al. 2023), a zero-shot model for machine-generated text detection which is highly accurate when the generative (or base) LLM is the same as the discriminative (or scoring) LLM. We find that simple summary statistics of DetectGPT sub-model outputs yield an AUROC of 0.73 (relative to 0.61) while retaining its zero-shot nature, and that supervised learning methods sharply boost the accuracy to an AUROC of 0.94 but require a training dataset. This suggests the possibility of further generalisation to create a highly-accurate, model-agnostic machine-generated text detector.